/pyprobml

Python code for "Machine learning: a probabilistic perspective" (2nd edition)

Primary LanguagePythonMIT LicenseMIT

pyprobml

Python 3 code for the new book series Probabilistic Machine Learning by Kevin Patrick Murphy. This is work in progress, so expect rough edges.

Running the scripts

The scripts directory contains python files to generate individual figures from vol 1 and vol 2 of the book. To manually execute an individual script from the command line, follow this example:

git clone --depth 1 https://github.com/probml/pyprobml /pyprobml &> /dev/null
python3 pyprobml/scripts/softmax_plot.py 

This will clone the repo (without the version history, to save time/space), run the script, plot a figure, and save the result to the pyprobml/figures directory.

Many scripts rely on external packages, such as scipy, jax, etc. In some cases, the list of necessary packages is noted in the comments at the start of the file. However, rather than making the user install each dependency themselves, you can just install a single meta-package:

pip install superimport 

This needs to be done on your local machine (or once per colab instance). You then need to add import superimport to the top of each of your scripts. The superimport library will parse your file, figure out all missing packages, and then install them for you, before running the rest of the script as usual. (If you run the script a second time, it skips the installation step.) Thus you will need an internet connection to run the code.

Some scripts download datasets stored in the probml-data repo. Thus you will need an internet connection to run the code.

Jupyter notebooks

The scripts needed to make all the figures for each chapter are automatically combined together into a series of Jupyter notebooks, one per chapter.

In addition to the automatically generated notebooks, there are a series of manually created notebooks, which create additional figures, and provide supplementary material for the book. These are stored in the notebooks repo, since they can be quite large. Some of these notebooks use the scripts mentioned above, but others are independent of the book content.

Colab

The best way to run the code is inside Colab. This has most of the libraries you will need (e.g., scikit-learn, JAX) pre-installed, and gives you access to a free GPU and TPU. We have a created a intro to colab notebook with more details.

You can run the book code inside colab as shown in the example below.

%%capture
!pip install superimport 
!pip install deimport

!git clone --depth 1 https://github.com/probml/pyprobml /pyprobml &> /dev/null
%cd /pyprobml/scripts

%run kf_tracking_demo.py

To run code from github, follow the example below. (Note the raw in the URL.)

!wget -q https://raw.githubusercontent.com/probml/pyprobml/master/scripts/softmax_plot.py
%run softmax_plot.py

To edit a file locally and then run, follow the example below.

%load_ext autoreload
%autoreload 2

file = 'foo.py' # change this filename as needed
!touch $file # create  file if necessary
from google.colab import files
files.view(file) # open editor

%run $file

GCP, TPUs, and all that

When you want more power or control than colab gives you, you should get a Google Cloud Platform (GCP) account, and get access to a TPU VM. You can then use this as a virtual desktop which you can access via ssh from inside VScode. We have created various tutorials on Colab, GCP and TPUs with more information.

How to contribute

See this guide for how to contribute code.

Metrics

Stargazers over time

GSOC 2021

For a summary of some of the contributions to this codebase during Google Summer of Code 2021, see this link.

Acknowledgements

I would like to thank the following people for contributing to the code (list autogenerated from this page):

murphyk mjsML Drishttii Duane321 gerdm animesh-007 Nirzu97 always-newbie161 karalleyna nappaillav jdf22 shivaditya-meduri Neoanarika andrewnc Abdelrahman350 Garvit9000c kzymgch alen1010 adamnemecek galv krasserm nealmcb petercerno Prahitha khanshehjad hieuza jlh2018 mvervuurt TripleTop
murphyk mjsML Drishttii Duane321 gerdm animesh-007 Nirzu97 always-newbie161 karalleyna nappaillav jdf22 shivaditya-meduri Neoanarika andrewnc Abdelrahman350 Garvit9000c kzymgch alen1010 adamnemecek galv krasserm nealmcb petercerno Prahitha khanshehjad hieuza jlh2018 mvervuurt TripleTop