Twitter Bot for Machine Learning: Stock Predictions

Overview

This project is about publishing (or tweeting) forecasts of 7 models on stocks in the financial market everyday. Users can see the predictions and ask for predictions of the companies they want.
Some of the main technologies and packages used:

Web Scraping using pandas
Machine Learning with sklearn and scikit-optimze
Twitter Bot on Tweepy API
Deploy: I'm looking for another online hosting platform

You can see the final project here, the Twitter account of Regress.

Files

regress_bot.py and funcs.py: Python program that is running in PythonAnywhere, and contains the bot code and the models training. funcs.py is a file that contais functions that I used in the regress_bot.py program.
companies.txt and last-mention-id.txt: Example of text files used by the Regress Bot to store the companies that it predicts and tweets, and the last tweet that mentioned it's account.
report.csv: Example of a report that's updated everyday by Regress Bot.

Methods

The program train 7 models everyday, tuning their hyperparameters. They are: Stochastic Gradient Descent, Ridge Regression, Linear Support Vector Regressor, K-Nearest Neighbors, Random Forest, Ada Boost and MLP. The models are trained with the last 30 days, and tested with the last 5 - the best 3 are chosen (based on their RMSE, Root Mean Squared Error) to tweet the predictions. As I observed, Linear SVR and SGD are the best ones.

Model	Tuned Hyperparameters
Stochastic Gradient Descent	Penalty, Alpha and Learning Rate
Ridge Regression	Regularization
Linear Support Vector Regression	Regularization
Regression based on k-NN	Number of Neighbors and Weights
Random Forest Regressor	Number of Trees
Ada Boost	Number of Estimators and Learning Rate
Multi-layer Perceptron Regressor	Activation Function and Learning Rate

As the data is a time series, the ideal would train models like Arima or Prophet, from Facebook.
I chose to use more "classic" models, because I wanted to see how these models would perform. As the next step, you could see how time series models would predict. Furthermore, as the program is hosted in a free platform, PythonAnywhere, it becomes impracticable train models with a lot of data - that's why I opted for 30 days (and using more days - I got to use 5 years - to train the models increased their RMSE, that's interesting). Maybe, with more recent data, the models learn stronger relations between the features and the label, one the recent data reflects better the today data.
Maybe, training a MLP (Multi-Layer Perceptron) in a small dataset is disproportionate - but I wanted to see how it would perform -, however, it seems to be a good model.

More details of the models training and selection is in regress_bot.py and funcs.py files. They are trained everyday with new data, that is scraped off the Yahoo Finance website, the hyperparameters are tuned with skicit-optimize, and their predictions are shown on the @RegressML account on Twitter.

My Data Science portfolio: link
My LinkedIn: link

Bruno Kenzo, 18 yo.

KenzoBH/Regress-Twitter-Bot

Twitter Bot for Machine Learning: Stock Predictions

Overview

Files

Methods