Randomized Algorithms for Gaussian Process Inference. Semester project, ANCHP Lab, EPFL.
Gaussian process inference engine based on GPyTorch (article, code). Theoretical analysis and numerical experiments can be found in the [report] of this repository. The engine relies on conjugate gradients and stochastic Lanczos quadrature, where partial pivoted Cholesky is used as a low rank approximation for preconditioning.
One must first install the Python packages used in this project.
The environment file is provided as environment.yaml
, which also contains the Jupyter Lab package to view the
notebook. The easiest method for installing is
- Install miniconda (a lightweight version of anaconda)
- Create the conda
gpr
(Gaussian Process Regression) environment that is later used to run the code:
conda env create -f environment.yml
One installation is complete, one must activate the environment:
user@host:/path$ conda activate gpr
(gpr) user@host:/path$ # gpr now appears, i.e. the environment is active
Run the script run_example.py
to see the inference engine in action on a toy dataset.
One should first activate the gpr
environment and then run the script:
$ conda activate gpr
(gpr) $ python run_example.py
This prints the likelihood at each optimization step and produces a figure.
The user entry point is the GPModel
class in models.py
and the kernels in kernel.py
.
src
folder containscg.py
: mBCG algorithmquadrature.py
: Lanczos quadraturechol.py
: pivoted Cholesky algorithmprecond.py
: routines for preconditioners of kernel matriceskernel.py
: squared exponential and Matern kernelslanczos.py
: Lanczos method for linear systems, used for debugginginference.py
: put pieces together to compute linear solve, logdet and trace termmodel.py
: compute likelihood and its gradient, perform optimization to fit hyperparameters
tests
: unit tests for algorithms correctness
All numerical experiments are gathered in the Jupyter notebook: notebook/Numerical Experiments.ipynb
.
In order to run it, cd
into the root of this project and run:
user@host:/path$ conda activate gpr
(gpr) user@host:/path$ jupyter lab
This should open a browser tab with a jupyter server in which you can view and run the notebook. Make sure to run the cells in order.
Some cells perform heavy computations, those start with %%time
and the running time
appears under the cell. Those cells can be skipped as their results can be loaded in the following cells.