Numerical-and-Textual-Analysis-for-Stock-Prediction

This is one of the project tasks from the @GRIP Internship Program of The Spark Foundation

Objective:

Predict the Indian Stock Exchange Sensitive Index - SENSEX from historical stock price and news headlines data from 2015/01/01-31/03/2022.

Transform Daily Close prices into logarithmic returns to attain better statistical properties, then used along side with Yesterday Close prices as numerical feature inputs.
Utilize the Huggingface pretrained RoBERTa cardiffnlp/twitter-roberta-base-sentiment-latest model for Sentiment Analysis on news headlines.
The LSTM was trained on numerical data only and used as a Baseline to contrast with the LightGBM which was trained on both numerical and textual analyzed data.

The LightGBM Regressor model as expected was able to fit and generalize better than the LSTM model with significantly lower RMSE (Root Mean Square Error).
The LightGBM Classifier was able to correctly predict the next day stock market's general movement (Buliish or Bearish) by 56.10%.