/pyprobml

Python code for "Machine learning: a probabilistic perspective" (2nd edition)

Primary LanguagePythonMIT LicenseMIT

pyprobml

Python 3 code for the new book series Probabilistic Machine Learning by Kevin Patrick Murphy. This is work in progress, so expect rough edges.

Running the scripts

The scripts directory contains python files to generate individual figures from vol 1 and vol 2 of the book. To manually execute an individual script from the command line, follow this example:

git clone --depth 1 https://github.com/probml/pyprobml /pyprobml &> /dev/null
python3 pyprobml/scripts/softmax_plot.py 

This will clone the repo (without the version history, to save time/space), run the script, plot a figure, and save the result to the pyprobml/figures directory.

Many scripts rely on external packages, such as scipy, jax, etc. In some cases, the list of necessary packages is noted in the comments at the start of the file. However, rather than making the user install each dependency themselves, you can just install a single meta-package:

pip install superimport

This needs to be done on your local machine (or once per colab instance). You then need to add import superimport to the top of each of your scripts. The superimport library will parse your file, figure out all missing packages, and then install them for you, before running the rest of the script as usual. (If you run the script a second time, it skips the installation step.) Thus you will need an internet connection to run the code.

Some scripts download datasets stored in the probml-data repo. Thus you will need an internet connection to run the code.

Jupyter notebooks

The scripts needed to make all the figures for each chapter are automatically combined together into a series of Jupyter notebooks, one per chapter.

In addition to the automatically generated notebooks, there are a series of manually created notebooks, which create additional figures, and provide supplementary material for the book. These are stored in the notebooks repo, since they can be quite large. Some of these notebooks use the scripts mentioned above, but others are independent of the book content.

Colab, GCP, TPUs, and all that

When you open a Jupyter notebook, there will be a button at the top that says 'Open in colab'. If you click on this, it will start a virtual machine (VM) instance on Google Cloud Platform (GCP), running Colab. This has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have created various tutorials on Colab, GCP and TPUs with more information.

Colab has many ML-related packages already installed, but not all. We use superimport, mentioned above, to automatically install the missing ones. One wrinkle arises if you try to run multiple scripts inside a single colab session (e.g., using %run foo.py and then %run bar.py). Because of python optimizations, it will only call superimport the first time, so it will work on foo but fail on bar. To force it to call superimport for each script, you need to unimport the superimport symbol before running the script, like this:

from deimport.deimport import deimport
deimport(superimport)
%run myscript.py

How to contribute

See this guide for how to contribute code.

Acknowledgements

I would like to thank the following people for contributing to the code (list autogenerated from this page):

murphyk mjsML Drishttii Duane321 gerdm animesh-007 Nirzu97 always-newbie161 karalleyna nappaillav jdf22 shivaditya-meduri Neoanarika andrewnc Abdelrahman350 Garvit9000c kzymgch alen1010 adamnemecek galv krasserm nealmcb petercerno Prahitha khanshehjad hieuza jlh2018 mvervuurt TripleTop
murphyk mjsML Drishttii Duane321 gerdm animesh-007 Nirzu97 always-newbie161 karalleyna nappaillav jdf22 shivaditya-meduri Neoanarika andrewnc Abdelrahman350 Garvit9000c kzymgch alen1010 adamnemecek galv krasserm nealmcb petercerno Prahitha khanshehjad hieuza jlh2018 mvervuurt TripleTop