stock-sentiment-analysis

Introduction

The stock market is arguably one of the most competitive environments, and every trader and institution is trying to gain an edge in this algorithmic arms race. Even though algorithmic trading computers have taken over the trading floors and now account for the majority of trades on the stock market, human psychology plays an important role in the market's movements. Stock market data is widely available and used for trading, however there are massive quantities of qualitative text data that has great influence over the market. In this project, our goal is to collect a data set of stock tweets and train a model to quantify sentiment on a given opinion about the stock market.

Methodology

What we're trying to do is classify a given tweet as either bullish or bearish. Bullish meaning they think the stock is going up, bearish meaning they it's going down. The type of problem we are trying to solve is text classification.

This project is comprised of 4 parts:

Collecting a data set and understanding potential bias
Loading, cleaning, and embedding data
Initializing the model
Training and Evaluating the model

Collecting a data set

In order to collect a data set, we are using Stocktwits' API. Stocktwits is a wrapper on top of Twitter for stock market related information. Stocktwits allows for users to tweet on their platform and mark their opinion as either bullish or bearish on the stock.

BERT Model

In this project, we are finetuning the robust NLP model, BERT. BERT is a transformer model developed by Google in 2018. To our benefit, the awesome people over at Hugging Face have made NLP a little bit easier for us. Their transformers library allows us to download the BERT model and use it with PyTorch. They also have many handy utilities to encode our data for the model. We'll be finetuning the BERT model with one additional output layer for binary classification.

nspeer12/stocktweets

stock-sentiment-analysis

Introduction

Methodology

Collecting a data set

BERT Model