Google Summer of Code'20 Highlights with NumFOCUS

This post is meant to summarize the work done over the GSoC coding period. Let's get started real quick.

About the project

My GSoC proposal was about adding a Variational Inference interface to PyMC4. Apart from MCMC algorithms, VI proposes an approximating distribution to fit the posterior. The whole plan was to implement two Variational Inference algorithms - Mean Field ADVI and Full Rank ADVI.

Resolving Key challenges

Key Challenges	Solutions proposed	How its resolved
`theano.clone` equivalent for TF2	Model execution with replaced inputs	Normal distribution's sample method is executed over flattened view of parameters
Flattened view of parameters	Use `tf.reshape()`	Used `tf.concat()` with `tf.reshape()`
Optimizers for ELBO	Use tf.keras.optimizers	Optimizers either from TFv1 or TFv2 with defaults from pymc3.updates can be used
Initialization of MeanField and Full Rank ADVI	Manually set bijectors	Relied on `tfp.TransformedVariable`
Progress bar	Use `tqdm` or `tf.keras.utils.Progbar`	A small hack over `tf.print`
Minibatch processing of data	Capture slice in memory	This is the only incomplete feature. Maybe `tf.Dataset` API has to explored more or implement our own `tfp.vi.fit_surrogate_posterior` function.

Community Bounding Period

This was a super interesting period. I got to know about many PyMC core developers through slack.
I spent the entire time learning about the basics of Bayesian statistics, prior, posterior predictive checks, and the theory of Variational Inference.
I had also written a blog post during this interval about the nuts and bolts of VI and the implementation of Mean Field ADVI as well in Tensorflow Probability. Here is the blog post - Demystify Variational Inference.
The most difficult part of learning VI was to understand the transformations because PyMC3 and TFP handle transformations differently.

Month 1

The coding period started from June 1 and my intention for this period was to add a very basic and general Variational Inference interface to PyMC4. Here is the PR #280 and workflow of the basic interface was -

Get the vectorized log prob of the model.
For each parameter of the model, have a Normal Distribution with the same shape and then build a posterior using tfd.JointDistributionSequential.
Add optimizers with defaults from PyMC3 and perform VI using tfp.fit_surrogate_posterior.
Sample from tfd.JointDistributionSequential and there is no need of equivalent of theano.clone.
Transform the samples by quering the SamplingState but Deterministics have to be added as well.
Resolve shape issues with ArviZ. In short, making chains=1.

I got the basic interface merged by late June and now, it was time to work upon Full Rank ADVI. I managed to open a PR #289 with Full Rank ADVI interface by the end of June.

Month 2

This was the most dramatic month of GSoC coding period. Because Full Rank ADVI proposed in PR #289 resulted in errors most of the time. Here is the gist of workflow that was followed to get some useful insights about the errors -

Instead of solving the shape issues independently and posing a MvNormal distribution for each parameter, build the posterior using flattened view of parameters.
There were lots of NaNs in the ELBO, because of improper handling of transformations. As a result, Interval, LowerBounded and UpperBounded transformations were added as well.
Then came the issue of Cholesky Decomposition errors while working with Gaussian Processes and Variational Inference. Here are my few insights after rigorous testing with different inputs -
- Use dtype tf.float64 with FullRank ADVI to maintain positive definiteness of covariance matrix.
- Avoid aggressive optimization of ELBO. Maintain learning rates around 1e-3.
- Stabilize the diagonal of covariance matrix by adding a small jitter.
- Double check for NaNs in the data.
Here the results after trying reparametrization and different jitter amounts while doing VI.

I got this PR merged by the end of July. And now, it was time to work on adding some features to ADVI.

Month 3

After adding missing transformations in PR #289, my mentor asked me to write a proposal so as the Bounded Distributions are inherited instead of we applying transformations manually to each distribution. I explored each possibility to make a generalized version of transformations as it is done in PyMC3 using tf.cond. Since, we do not have values before model execution, it was difficult to use tf.cond. Here is the proposal's source.

After getting an interface to use MeanField and FullRank ADVI, some features that are included in the PR #310 -

Add a progress bar. (This is small hack over tf.print)
Test progress bar in different OS.
Add ParameterConvergence criteria to test convergence.
Add LowRank Approximation.

I am still working on adding examples on hierarchical models and I hope to get it merged soon.

Contributions

The Pull Requests I have opened and got merged during GSoC. I have explained each one above but here I try to summarize.

Add Variational Inference Interface: #280
Add Full Rank Approximation: #289
Add features to ADVI: #310 (WIP)
Remove transformations for Discrete distributions: #314

Gists created

Whatever experiments I perform to aid my learnings, I polish them out and share through GitHub gists. I do not why but I started loving to share code through GitHub gists rather than Colab or GitHub repo. Here are all the experiments I performed with ADVI during this summer.

Comparison of MeanField ADVI in TFP, PyMC3, PyMC4: Source
Demonstration of shape issues while working with InferenceData: Source
Playing around Convergence and Optimizers: Source
Tracking all parameters including deterministics: Source
Implementation of FullRank ADVI in TFP: Source
Comparison of MeanField and FullRank ADVI over correlated Gaussians: Source
Model flattening and Full Rank ADVI in PyMC4: Source
Missing transformations in PyMC4: Source
Testing transformations in PyMC4: Source
Distribution Enhancement Proposal: Source
Hacking tf.print for progress bar: Source
Parameter Convergence Checks in TFP: Source

Future Goals

Some future tasks I would like to work upon -

Configure Mini Batch processing of data.
Add Normalizing Flows to variational inference interface.
Add support of Variational AutoEncoders to PyMC4.

Conclusion

It was an incredible experience contributing to open source. I have improved my Python skills. I want to thank my mentors @ferrine and @twiecki for being extremely supportive throughout this entire journey. I am loving my time with the PyMC community. Next, I also want to thank @numfocus community for sharing this opportunity via Google Summer of Code.

Thank you for being a part of this fantastic summer.

With ❤️, Sayam Kumar

Sayam753/GSoC-Learnings