Machine Learning and Data Science Blueprints for Finance - Jupyter Notebooks

This github repository contains the code to the case studies in the O'Reilly book Machine Learning and Data Science Blueprints for Finance

Simply open the Jupyter notebooks you are interested in by cloning this repository and running Jupyter locally. This option lets you play around with the code. In this case, follow the installation instructions below.

Join slack channel to discuss the book or ML in Finance in general:

Want to play with these notebooks online without having to install anything?

Use any of the following services.

WARNING: Please be aware that these services provide temporary environmets: anything you do will be deleted after a while, so make sure you download any data you care about.

Recommended: Open it in Binder:
- Note: Binder is a hosting service and the directories of the book will open exactly like they open on your local machine with no installation required. The connection between different files within the folder will work seamlessly. Most of the time, Binder starts up quickly and works great, but when the github repository of this book is updated, Binder creates a new environment from scratch, and this can take quite some time. Also, some of the case study, specially that require more cache data might be slow.
Open this repository in Colaboratory:
- Note: Google colab supports GPU and can be quite fast. However, the linkages to data file located in the folders of the git directory may not work. Upload the data files seperately while running the jupyter notebooks on google colab. For loading the data files on google colab, you can replace the local directory path with the github path. For example, for the data of case study 1 of chapter 7 dataset = read_csv('Dow_adjcloses.csv') in the code can be replace with dataset = read_csv('https://raw.githubusercontent.com/tatsath/fin-ml/master/Chapter%207%20-%20Unsup.%20Learning%20-%20Dimensionality%20Reduction/CaseStudy1%20-%20Portfolio%20Management%20-%20Eigen%20Portfolio/Dow_adjcloses.csv') for it to work on google colab.

Just want to quickly look at some notebooks, without executing any code?

Browse this repository using jupyter.org's notebook viewer:

Want to install this project on your own machine?

Start by installing Anaconda (or Miniconda), git, and if you have a TensorFlow-compatible GPU, install the GPU driver.

Next, clone this project by opening a terminal and typing the following commands (do not type the first $ signs on each line, they just indicate that these are terminal commands):

$ cd $HOME  # or any other development directory you prefer
$ git clone https://github.com/tatsath/fin-ml.git
$ cd fin-ml

If you do not want to install git, you can instead download master.zip, unzip it, rename the resulting directory to fin-ml and move it to your development directory.

If you are familiar with Python and you know how to install Python libraries, go ahead and install the libraries listed in requirements.txt and jump to the Starting Jupyter section. If you need detailed instructions, please read on. We would encourage you to stick to the version of the packages in the 'requirement.txt' file.

Python & Required Libraries

Of course, you obviously need Python. Python 3 is already preinstalled on many systems nowadays. You can check which version you have by typing the following command (you may need to replace python3 with python):

$ python3 --version  # for Python 3

Any Python 3 version should be fine, preferably 3.5 or above. If you don't have Python 3, we recommend installing it. To do so, you have several options: on Windows or MacOSX, you can just download it from python.org. On MacOSX, you can alternatively use MacPorts or Homebrew. If you are using Python 3.6 on MacOSX, you need to run the following command to install the certifi package of certificates because Python 3.6 on MacOSX has no certificates to validate SSL connections (see this StackOverflow question):

$ /Applications/Python\ 3.6/Install\ Certificates.command

On Linux, unless you know what you are doing, you should use your system's packaging system. For example, on Debian or Ubuntu, type:

$ sudo apt-get update
$ sudo apt-get install python3 python3-pip

Installing Anaconda

After installing Python, we recommend installing Anaconda. This is a package that includes both Python and many scientific libraries. You should prefer the Python 3 version.

Using pip

Installing Anaconda, should install most of the commonly used libraries in the case studies. Given that there might be changes to the Anaconda package and some libraries might be out of date, it is a good idea to learn how to install packages in python using pip.

Installing pip

These are the commands you need to type in a terminal if you want to use pip to install. Note: in all the following commands, if you chose to use Python 2 rather than Python 3, you must replace pip3 with pip, and python3 with python.

First you need to make sure you have the latest version of pip installed. If you are on the latest version of Python, pip should already be installed. You can check using the following command.

$ pip -V

If you do not have pip install, you can run the following command on Linux

$ sudo apt-get install python3-pip

Or download get-pip.py and install it on Windows using

$ python3 get-pip.py

If you have pip already installed, it might be a good idea to upgrade it.

$ python3 -m pip install --user --upgrade pip

The --user option will install the latest version of pip only for the current user. If you prefer to install it system wide (i.e. for all users), you must have administrator rights (e.g. use sudo python3 instead of python3 on Linux), and you should remove the --user option. The same is true of the command below that uses the --user option.

Creating an environment (optional)

Next, you can optionally create an isolated environment. This is recommended as it makes it possible to have a different environment for each project (e.g. one for this project), with potentially very different libraries, and different versions:

$ python3 -m pip install --user --upgrade virtualenv
$ python3 -m virtualenv -p `which python3` env

This creates a new directory called env in the current directory, containing an isolated Python environment based on Python 3. If you installed multiple versions of Python 3 on your system, you can replace `which python3` with the path to the Python executable you prefer to use.

Now you must activate this environment. You will need to run this command every time you want to use this environment.

$ source ./env/bin/activate

On Windows, the command is slightly different:

$ .\env\Scripts\activate

Installing Python packages

Next, use pip to install the required python packages. If you are not using virtualenv, you should add the --user option (alternatively you could install the libraries system-wide, but this will probably require administrator rights, e.g. using sudo pip3 instead of pip3 on Linux).

The following command is used to install python package with a particular version.

$ pip3 install <PACKAGE>==<VERSION>

If you want to try to install a list of packages from a file. You can use the following command.

$ python3 -m pip install --upgrade -r requirements.txt

Great! You're all set, you just need to start Jupyter now.

Installing Package models

For the chapter on Natural Language Processing. We will be using the spaCy python package. Installing spaCy does not install the language models used. In order to do that, we need to open up python and install it ourselves using the following commands.

$ python -m spacy download en_core_web_lg

Starting Jupyter

Okay! You can now start Jupyter, simply type:

$ jupyter notebook

This should open up your browser, and you should see Jupyter's tree view, with the contents of the current directory. If your browser does not open automatically, visit 127.0.0.1:8888. Click on index.ipynb to get started!

Installing Libraries in Jupyter using pip

If you install a library and are not able to import it on the jupyter notebook. You might be installing them on the system python environment. We can use Jupyter notebooks to install packages using the ! symbol at the start. THe following libraries are the ones that are required outside the latest Anaconda package as of now.

$ !pip install spacy
$ !pip install pandas-datareader
$ !pip install keras
$ !pip install dash
$ !pip install dash
$ !pip install dash_daq
$ !pip install quandl
$ !pip install cvxopt

Want to look at the individual case studies or jupyter notebooks?

Notebooks by Application in Finance

Notebooks by Machine Learning Types

1. Supervised Learning- Regression and Time series Models

Stock Price Prediction
Derivative Pricing
Investor Risk Tolerance and Robo-advisors
Yield Curve Prediction

2. Supervised Learning- Classification Models

Fraud Detection
Loan Default Probability
Bitcoin Trading Strategy

3. Unsupervised Learning- Dimensionality Reduction Models

Portfolio Management - Eigen Portfolio
Yield Curve Construction and Interest Rate Modeling
Bitcoin Trading - Enhancing Speed and accuracy

4. Unsupervised Learning- Clustering

Clustering for Pairs Trading
Portfolio Management - Clustering Investors
Hierarchial Risk Parity

5. Reinforcement Learning

Reinforcement Learning based Trading Strategy
Derivatives Hedging
Portfolio Allocation

6. Natural Language Processing

NLP and Sentiments Analysis based Trading Strategy
Digital Assistant-chat-bots
Documents Summarization

Master Template for different machine learning type

Supervised learning - Regression and Time series
Supervised learning - Classification
Unsupervised learning - Dimensionality Reduction
Unsupervised learning - Clustering
Natural Language Processing

vvhg1/fin-ml