Code for "Deep Convolutional Networks as shallow Gaussian Processes" (arXiv, other material), by Adrià Garriga-Alonso, Laurence Aitchison and Carl Edward Rasmussen. It uses GPflow and TensorFlow.
A PyTorch version of these same experiments can be found at https://github.com/cambridge-mlg/cnn-gp.
This package has been tested only with python 3.5 and 3.6.
First, you need to install the package in developer mode. This will download and install all necessary python dependencies:
cd convnets-as-gps
# optionally: pip install --user -e .
pip install -e .
If you have an "old" CPU, this might crash and return "Illegal Instruction". This is because recent versions of Tensorflow come with AVX instructions enabled. Install tensorflow 1.5.0 to fix this.
Easy way to run all the experiments: read/run run_all_experiments.bash
All the experiments in the paper are run in a two-stage process:
- Run
save_kernels.py
orsave_kernels_resnet.py
, to compute kernel matrices and save them to disk in a working directory. Disk space required: about 15GB for 1 run. Runpython3 program.py --help
for detailed information, but here are example invocations:
python3 save_kernels.py --seed=<random seed> --n_max=200 --path=/path/to/working/directory
python3 save_kernels_resnet.py --n_gpus=1 --n_max=200 --path=/path/to/working/directory
In particular, the n_max
flag determines how many training examples your GPU
processes simultaneously. The memory requirements scale roughly proportionally
to n_max
^2, adjust the number for your particular hardware.
You might run into "Matrix is singular" errors. In my testing, those can be
removed by reducing n_max
. This must be a bug of some kind in the libraries
that I use (Tensorflow maybe?), but I have no skill or time to acquire the
skill to troubleshoot it. Just reduce n_max
.
- Run
classify_gp.py
to invert the kernel matrix and calculate test results. This requires a lot of CPU RAM memory, at least enough to hold the matrix to invert with 64-bit precision. For MNIST, the main kernel matrix is ~12GB, so you need ~24GB of memory to maintain a decent speed. I'm sure there's a way to do the inverse reasonably fast and more memory-efficiently, but that would take quite a bit of development time.
Note: the version in arXiv is slightly newer and contains information about which hyperparameters turned out to be the most effective for each architecture.
@inproceedings{aga2018cnngp,
author = {{Garriga-Alonso}, Adri{\`a} and Aitchison, Laurence and Rasmussen, Carl Edward},
title = {Deep Convolutional Networks as shallow {G}aussian Processes},
booktitle = {International Conference on Learning Representations},
year = {2019},
url = {https://openreview.net/forum?id=Bklfsi0cKm}}