/Tick-er-Tweet

Sentiment analysis of tweets, analyzing stock trends and impact of tweets on the stock market

Primary LanguageHTML

Tick-er-Tweet

Sentiment analysis of tweets, analyzing stock trends and impact of tweets on the stock market

Contents

This document describes my final project for BIOF309 Introduction to Python. The code and all supporting files are in the Tick-er-Tweet GitHub repo

Objective

The idea for this project was conceived when we noticed that if Trump mentioned a company within his tweets, their shares would either go up in price if the tweet was positive or drop if the tweet was negative. We decided to make a python script to follow Trump and monitor the companies that he mentioned since the day he was inducted as the POTUS, to see how the tweet affected the price of the company's shares or the general stock market indices. The goal of this project is to provide adequate links for scholars who want to research in this domain; and at the same time, be sufficiently accessible for developers who want to integrate sentiment analysis into their applications.The code is flexible enough to be amended for retrieving tweets from other user(s) and data for other stocks.

Introduction

The code is divided into five separate classes for easy understanding, editing and calling in the main() function:

Flowchart

NameRelevant filesOutput
Getting stock data main.py stockdata
Twitter Scraping scrape.py get_metadata.py tweetdata
Preprocessing of tweet and stock data processdata.py Text file
Sentiment analysis of tweets sentimentanalysis.py Report
Candlestick plot of stock and filtered tweet dataplots.py Text & Figures sampleoutput

Description

List of Python 3.6 packages required across all scripts

pip3 install numpy
pip3 install pandas
pip3 install matplotlib
pip3 install seaborn
pip3 install tweepy
pip3 install selenium
pip3 install quandl
pip3 install json
pip3 install plotly
pip3 install textblob

Twitter_Scraping

Twitter makes it hard to get all of a user's tweets (assuming they have more than 3200). This is a way to get around that using Python, Selenium, and Tweepy. Essentially, we will use Selenium to open up a browser and automatically visit Twitter's search page, searching for a single user's tweets on a single day. If we want all tweets from 2015, we will check all 365 days / pages. This would be a nightmare to do manually, so the scrape.py script does it all for you - all you have to do is input a date range and a twitter user handle, and wait for it to finish. The scrape.py script collects tweet ids. If you know a tweet's id number, you can get all the information available about that tweet using Tweepy - text, timestamp, number of retweets / replies / favorites, geolocation, etc. Tweepy uses Twitter's API, so you will need to get API keys. Once you have them, you can run the get_metadata.py script.

Requirements:

  1. Tweepy- pip3 install tweepy
  2. Selenium- pip3 install selenium
  3. Twitter Apps Account

Get_stock_data

Yahoo! finance has decommissioned their historical data API and as a result the most popular Python packages for retrieving data have stopped functioning properly. This script uses the Quandl API for retrieving stock data and returns a .xlsx file based on the list of Stock ticker names provided to the query. 📈

Requirements: Quandl API - pip3 install quandl

Data_preprocessing

Performs the basic 'cleaning' and filtering of the .csv and .xlsx files for tweet data and stock data respectively. This script also filters the tweet data to get a dataframe object of tweets mentioning the stocks/ keywords of interest.

Sentiment_Analysis

This script takes the .csv tweet file returns the results of sentiment analysis of all tweets as positive(+1), negative (-1) or neutral(0). It also gives some general information about the trends from the tweet file such as max likes and max retweets.

Requirements: Textblob - pip3 install textblob

Plotting_data

The stock data is plotted using the Plotly package in Python3. plotly.py is an interactive, browser-based graphing library for Python ✨. You need to create a free account for accessing the online plots but you can also plot data offline using the offline feature in the package.

Requirements: Plotly- pip3 install plotly

Main()

Main() function calls the above classes in the specified order and returns the results. Running main:

python3 main.py "stockdata.xlsx" "indexdata.xlsx" "<quandl-api-key>" "<twitter-user-handle>"

Inference

Weak to no correlation between Trump’s tweet sentiment score and the stock index, as multiple factors can affect the stock market including:

  • New policies not mentioned in tweets
  • News about a company’s earnings, acquisitions etc.
  • A switch in investor sentiment in general

Only a short term effect was observed on the stock market values of most companies and they seemed to recover from this slump in the long run.

Challenges and Path forward

The sentiment analysis tool has limitations in accurately gauging the sentiment of sarcasm or tweets that don't fall in the category of positive/ negative/ neutral keywords.

New features can be added to the script for giving information about the nature of tweets and stock data and their correlation. Additional data can be gathered from other sources to make the analysis more reliable.

References

Further_Reading

Acknowledgment


  • Martin Skarzynski, Michael & Ben

Collaborators


Please send any questions/comments to us: anupmath at gmail dot com or atimahs16 at gmail dot com. 📢