Python 3 code for the new book series Probabilistic Machine Learning by Kevin Patrick Murphy. This is work in progress, so expect rough edges.
The scripts
directory contains python files to generate individual figures from vol 1 and vol 2 of the book.
To manually execute an individual script from the command line,
follow this example:
git clone --depth 1 https://github.com/probml/pyprobml /pyprobml &> /dev/null
python3 pyprobml/scripts/softmax_plot.py
This will clone the repo (without the version history, to save time/space), run the script, plot a figure, and save the result to the pyprobml/figures
directory.
Many scripts rely on external packages, such as scipy, jax, etc. In some cases, the list of necessary packages is noted in the comments at the start of the file. However, rather than making the user install each dependency themselves, you can just install a single meta-package:
pip install superimport
This needs to be done on your local machine (or once per colab instance). You then need to add import superimport
to the top of each of your scripts.
The superimport
library will parse your file, figure out all missing packages, and then install them for you,
before running the rest of the script as usual. (If you run the script a second time, it skips the installation step.)
Thus you will need an internet connection to run the code.
Some scripts download datasets stored in the probml-data repo. Thus you will need an internet connection to run the code.
The scripts needed to make all the figures for each chapter are automatically combined together into a series of Jupyter notebooks, one per chapter.
- Volume 1 figure notebooks
- Volume 2 figure notebooks. (Note: volume 2 is not finished yet.)
In addition to the automatically generated notebooks, there are a series of manually created notebooks, which create additional figures, and provide supplementary material for the book. These are stored in the notebooks repo, since they can be quite large. Some of these notebooks use the scripts mentioned above, but others are independent of the book content.
When you open a Jupyter notebook, there will be a button at the top that says 'Open in colab'. If you click on this, it will start a virtual machine (VM) instance on Google Cloud Platform (GCP), running Colab. This has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have created various tutorials on Colab, GCP and TPUs with more information.
Colab has many ML-related packages already installed, but not all. We use superimport, mentioned above, to automatically install the missing ones.
One wrinkle arises if you try to run multiple scripts inside a single colab session (e.g., using %run foo.py
and then %run bar.py
). Because of python
optimizations, it will only call
superimport the first time, so it will work on foo
but fail on bar
.
To force it to call superimport for each script, you need to unimport the superimport symbol before running the script, like this:
from deimport.deimport import deimport
deimport(superimport)
%run myscript.py
See this guide for how to contribute code.
I would like to thank the following people for contributing to the code (list autogenerated from this page):