Stock Market Data for
Teslabetween \(2016\) and \(2022\)
The dataset we use for the following statistical analysis is stock
market data for Elon Musk’s publicly traded companies Tesla
and Twitter from \(01/01/2018\) to \(05/20/2022\). We obtained this data from
the Yahoo Finance API using the package quantmod in R. The
data contains the daily percentage returns for the Tesla and Twitter
stock indexes, with \(2208\)
observations on the following \(13\)
variables.
| variable | type | description |
|---|---|---|
| symbol | character | The ticker symbol uniquely indefintying a stock |
| date | datetime | The trade day of the recorderd observation |
| open | float | Opening value of the stock that day |
| close | float | Closing value of the stock that day |
| high | float | Highest price of the stock on a given trade day |
| low | float | Lowest price of the stock on a given trade day |
| volume | integer | Number of daily shares traded in billions |
| direction | factor | Factor indicating whether the market had a positive or negative return |
| return | decimal | Percentage return for that day |
| lag1 | decimal | Percentage return for previous day |
| lag2 | decimal | Percentage return for 2 days previous |
| lag3 | decimal | Percentage return for 3 days previous |
| lag4 | decimal | Percentage return for 4 days previous |
Twitter Data for
Elon Muskbetween \(2016\) and \(2022\)
Dataset of Elon Musk’s most recent Tweets during 2015-2022, stored in CSV format, where each row represents a separate tweet object. All Tweets are collected, parsed, and plotted using the Twitter API and rtweet package in R. In total, there are more than ten-thousand tweets in this dataset, including retweets, replies, and quotes. All objects are to go into a single database.
Here, we use the pairs function to create a scatterplot
matrix for every pair of variables in the stock dataset as shown
below.
df.pairs <- df %>% dplyr::select(-alltext)
pairs(stocks.data)
Based on the correlation coefficients and their corresponding
p-values, there is indeed an association between the
daily return rate and the predictors volume,
lag2, nfav, nretweet, and
nreply.