This project incorporates a random forest algorithm which can emit monthly buy signals from the fundamentals of the companies listed on the S&P1500.
The objective of this project is to beat the performance of the Standard & Poor 500, the most commonly used benchmark in the finance industry.
All the required code is found in the SP1500stockPicker.ipynb file. You can run it using Jupyter Notebook which comes with the Anaconda distribution which you can find at https://www.anaconda.com/distribution/#download-section. The code is running on Python 3.7.
The data used in the program was obtained from Bloomberg and Compustat. Both services are proprietary and therefore the data cannot be published online.
- ratios_1990_2019.csv ( File containing all the financial ratios of the US companies, obtained with Compustat)
- yield_1962_2019.csv (File containing all the monthly returns of the US companies, obtained with Bloomberg)
- SP1500constituents.csv (File containing all the past and present constituents of the S&P 1500 index, obtained with Compustat)
- ^GSPC.csv (File containing all the closing price of the S&P 500, obtained via Yahoo Finance)
-
python== 3.7
-
scikit-learn==0.21.2
-
pandas==0.24.2
-
matplotlib==3.1.1
-
numpy==1.16.4
The critical functions of the program (merging database and machine learning) use multiprocessing to accelerate the different task. The entire notebook takes about 2.5h to run on a 24 thread computer with 128gb RAM installed. Performance will therefore vary depending on your hardware.
In Jupyter, hit "Run All Cells" to execute the entire script. You will find intermediate steps after the data cleaning and feature selection to skip unecessary work for the CPU when changing components of the code.
- sklearn - Machine Learning package for Python
- feature_selector - Feature Selection
Feel free to reach out to me by email at thomas.rochefort-beaudoin@polymtl.ca for suggestions or questions about the repo.
- Thomas Rochefort-Beaudoin - Initial work
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details
- Robert Normand, Teacher at Polytechnique Montréal and researcher at CIRANO Montréal, for his insights and help.