For the project we plan to analyze stock price and stock news data to ultimately create a machine learning model to predict stock prices based on current events.
Stock price dataset (Download):
https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs
News headlines dataset (Download):
https://www.kaggle.com/aaron7sun/stocknews
Additional headlines dataset (Web-Crawling):
We will be scraped from crawling the reddit news page (r/news) with the pushshift.io Reddit API (https://github.com/pushshift/api).
The stock market and news headlines datasets are correlated since national and international events have effects on economic outlook thus causing stock prices to fluctuate. We plan on finding if there are correlations between the type of news event (war, presidential election outcomes) and changes to stock prices of companies in certain industries. To show these correlations we will apply EDA to the data, comparing variables such as names, places, and other key words in the headlines and seeing if there are any strong connections to stock prices.
Stock price dataset provides the low, high, open and close price for a stock, the date that the prices were recorded and the volume. They’re all stocks traded on the NYSE, NASDAQ, and NYSE MKT.
News headlines dataset provides the top 25 headlines from /r/worldnews, a Reddit community where people post articles relating to news outside of the US. The dataset contains headlines from 2008-06-08 to 2016-07-01.