This post is meant to summarize the work done over the GSoC coding period. Let's get started real quick.
My GSoC proposal was about adding a Variational Inference interface to PyMC4. Apart from MCMC algorithms, VI proposes an approximating distribution to fit the posterior. The whole plan was to implement two Variational Inference algorithms - Mean Field ADVI and Full Rank ADVI.
Key Challenges | Solutions proposed | How its resolved |
---|---|---|
theano.clone equivalent for TF2 |
Model execution with replaced inputs | Normal distribution's sample method is executed over flattened view of parameters |
Flattened view of parameters | Use tf.reshape() |
Used tf.concat() with tf.reshape() |
Optimizers for ELBO | Use tf.keras.optimizers | Optimizers either from TFv1 or TFv2 with defaults from pymc3.updates can be used |
Initialization of MeanField and Full Rank ADVI | Manually set bijectors | Relied on tfp.TransformedVariable |
Progress bar | Use tqdm or tf.keras.utils.Progbar |
A small hack over tf.print |
Minibatch processing of data | Capture slice in memory | This is the only incomplete feature. Maybe tf.Dataset API has to explored more or implement our own tfp.vi.fit_surrogate_posterior function. |
- This was a super interesting period. I got to know about many PyMC core developers through slack.
- I spent the entire time learning about the basics of Bayesian statistics, prior, posterior predictive checks, and the theory of Variational Inference.
- I had also written a blog post during this interval about the nuts and bolts of VI and the implementation of Mean Field ADVI as well in Tensorflow Probability. Here is the blog post - Demystify Variational Inference.
- The most difficult part of learning VI was to understand the transformations because PyMC3 and TFP handle transformations differently.
The coding period started from June 1 and my intention for this period was to add a very basic and general Variational Inference interface to PyMC4. Here is the PR #280 and workflow of the basic interface was -
- Get the vectorized
log prob
of the model. - For each parameter of the model, have a Normal Distribution with the same shape and then build a posterior using
tfd.JointDistributionSequential
. - Add optimizers with defaults from PyMC3 and perform VI using
tfp.fit_surrogate_posterior
. - Sample from
tfd.JointDistributionSequential
and there is no need of equivalent oftheano.clone
. - Transform the samples by quering the
SamplingState
butDeterministics
have to be added as well. - Resolve shape issues with ArviZ. In short, making
chains=1
.
I got the basic interface merged by late June and now, it was time to work upon Full Rank ADVI. I managed to open a PR #289 with Full Rank ADVI interface by the end of June.
This was the most dramatic month of GSoC coding period. Because Full Rank ADVI proposed in PR #289 resulted in errors most of the time. Here is the gist of workflow that was followed to get some useful insights about the errors -
- Instead of solving the shape issues independently and posing a
MvNormal
distribution for each parameter, build the posterior using flattened view of parameters. - There were lots of NaNs in the ELBO, because of improper handling of transformations. As a result,
Interval
,LowerBounded
andUpperBounded
transformations were added as well. - Then came the issue of
Cholesky Decomposition errors
while working with Gaussian Processes and Variational Inference. Here are my few insights after rigorous testing with different inputs -- Use dtype
tf.float64
with FullRank ADVI to maintain positive definiteness of covariance matrix. - Avoid aggressive optimization of ELBO. Maintain learning rates around
1e-3
. - Stabilize the diagonal of covariance matrix by adding a small jitter.
- Double check for NaNs in the data.
- Use dtype
- Here the results after trying reparametrization and different jitter amounts while doing VI.
I got this PR merged by the end of July. And now, it was time to work on adding some features to ADVI.
After adding missing transformations in PR #289, my mentor asked me to write a proposal so as the Bounded Distributions are inherited instead of we applying transformations manually to each distribution. I explored each possibility to make a generalized version of transformations as it is done in PyMC3 using tf.cond
. Since, we do not have values before model execution, it was difficult to use tf.cond
. Here is the proposal's source.
After getting an interface to use MeanField and FullRank ADVI, some features that are included in the PR #310 -
- Add a progress bar. (This is small hack over
tf.print
) - Test progress bar in different OS.
- Add
ParameterConvergence
criteria to test convergence. - Add LowRank Approximation.
I am still working on adding examples on hierarchical models and I hope to get it merged soon.
The Pull Requests I have opened and got merged during GSoC. I have explained each one above but here I try to summarize.
- Add Variational Inference Interface: #280
- Add Full Rank Approximation: #289
- Add features to ADVI: #310 (WIP)
- Remove transformations for Discrete distributions: #314
Whatever experiments I perform to aid my learnings, I polish them out and share through GitHub gists. I do not why but I started loving to share code through GitHub gists rather than Colab or GitHub repo. Here are all the experiments I performed with ADVI during this summer.
- Comparison of MeanField ADVI in TFP, PyMC3, PyMC4: Source
- Demonstration of shape issues while working with InferenceData: Source
- Playing around Convergence and Optimizers: Source
- Tracking all parameters including deterministics: Source
- Implementation of FullRank ADVI in TFP: Source
- Comparison of MeanField and FullRank ADVI over correlated Gaussians: Source
- Model flattening and Full Rank ADVI in PyMC4: Source
- Missing transformations in PyMC4: Source
- Testing transformations in PyMC4: Source
- Distribution Enhancement Proposal: Source
- Hacking
tf.print
for progress bar: Source - Parameter Convergence Checks in TFP: Source
Some future tasks I would like to work upon -
- Configure Mini Batch processing of data.
- Add Normalizing Flows to variational inference interface.
- Add support of Variational AutoEncoders to PyMC4.
It was an incredible experience contributing to open source. I have improved my Python skills. I want to thank my mentors @ferrine and @twiecki for being extremely supportive throughout this entire journey. I am loving my time with the PyMC community. Next, I also want to thank @numfocus community for sharing this opportunity via Google Summer of Code.
Thank you for being a part of this fantastic summer.
With ❤️, Sayam Kumar