pyprobml
Python 3 code for the new book series Probabilistic Machine Learning by Kevin Patrick Murphy. This is work in progress, so expect rough edges.
Running the scripts
The scripts
directory contains python files to generate individual figures from vol 1 and vol 2 of the book.
To manually execute an individual script from the command line,
follow this example:
git clone --depth 1 https://github.com/probml/pyprobml /pyprobml &> /dev/null
python3 pyprobml/scripts/softmax_plot.py
This will clone the repo (without the version history, to save time/space), run the script, plot a figure, and save the result to the pyprobml/figures
directory.
Many scripts rely on external packages, such as scipy, jax, etc. In some cases, the list of necessary packages is noted in the comments at the start of the file. However, rather than making the user install each dependency themselves, you can just install a single meta-package:
pip install superimport
This needs to be done on your local machine (or once per colab instance). You then need to add import superimport
to the top of each of your scripts.
The superimport
library will parse your file, figure out all missing packages, and then install them for you,
before running the rest of the script as usual. (If you run the script a second time, it skips the installation step.)
Thus you will need an internet connection to run the code.
Some scripts download datasets stored in the probml-data repo. Thus you will need an internet connection to run the code.
Jupyter notebooks
The scripts needed to make all the figures for each chapter are automatically combined together into a series of Jupyter notebooks, one per chapter.
- Volume 1 figure notebooks
- Volume 2 figure notebooks. (Note: volume 2 is not finished yet.)
In addition to the automatically generated notebooks, there are a series of manually created notebooks, which create additional figures, and provide supplementary material for the book. These are stored in the notebooks repo, since they can be quite large. Some of these notebooks use the scripts mentioned above, but others are independent of the book content.
Colab
The best way to run the code is inside Colab. This has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have a created a intro to colab notebook with more details.
You can run the book code inside colab as shown in the example below.
%%capture
!pip install superimport
!pip install deimport
!git clone --depth 1 https://github.com/probml/pyprobml /pyprobml &> /dev/null
%cd /pyprobml/scripts
%run kf_tracking_demo.py
To run code from github, follow the example below.
(Note the raw
in the URL.)
!wget -q https://raw.githubusercontent.com/probml/pyprobml/master/scripts/softmax_plot.py
%run softmax_plot.py
To edit a file locally and then run, follow the example below.
%load_ext autoreload
%autoreload 2
file = 'foo.py' # change this filename as needed
!touch $file # create file if necessary
from google.colab import files
files.view(file) # open editor
%run $file
GCP, TPUs, and all that
When you want more power or control than colab gives you, you should get a Google Cloud Platform (GCP) account, and get access to a TPU VM. You can then use this as a virtual desktop which you can access via ssh from inside VScode. We have created various tutorials on Colab, GCP and TPUs with more information.
How to contribute
See this guide for how to contribute code.
Metrics
GSOC 2021
For a summary of some of the contributions to this codebase during Google Summer of Code 2021, see this link.
Acknowledgements
I would like to thank the following people for contributing to the code (list autogenerated from this page):