Make new API for HoloPy (version 4)
vnmanoharan opened this issue · 1 comments
Proposal
HoloPy version 4 should rely on pymc
for inference.
Rationale
When we first included Bayesian inference in HoloPy, the options for efficient gradient-free samplers were limited, and we chose emcee
because it was simple to implement and worked well on many of our inference problems. However, we had to write a lot of code to handle priors, likelihoods, and models, and that code is difficult to maintain and to adapt to other kinds of samplers.
pymc
has a nice model-specification syntax that includes a variety of distributions that can be used for priors or likelihoods, and it now includes several gradient-free samplers. Previously it had only a Metropolis sampler, Gibbs sampler, and NUTS sampler, but we couldn't take advantage of NUTS because we don't have analytic gradients for most of our scattering modules, and the fortran codes are not auto-differentiable in python. Metropolis and Gibbs are too inefficient for our purposes. But now pymc
has a Differential Evolution Metropolis Sampler that, like emcee
, is ensemble-based (uses multiple walkers, though the MC moves it makes are different from those in the affine-invariant ensemble sampler used in emcee
). pymc
also has a Sequential Monte Carlo sampler that is tempered, making it useful for multi-modal posteriors and for calculating marginal likelihoods and Bayes factors for model comparison. These two samplers might suffice for most of our use cases. They seem to work reasonable well in moderate-dimensional spaces (10 parameters or so), which corresponds to many of our models.
In addition, we would get a full model-specification syntax that is similar to that used in other modern statistical modeling probabilistic programming packages, like STAN and NumPyro. We would no longer have to maintain lots of code to handle objects and methods for priors and models.
Downsides
This change will break all existing workflows, and there's no guarantee that the new samplers will work. Also, a lot of code will have be removed, refactored, and (re)written.
Implementation challenges
- The main challenge is that
pymc
models are based onPyTensor
, which allows for auto-differentiation. Even though we won't make use of auto-differentiation, we still need to define our model in a way thatPyTensor
can deal with it. There is a nice guide on how to do this for "black-box" likelihoods (models, like ours, that cannot be inspected and differentiated by PyTensor). It involves creating a customPyTensor Op
that runs the black-box code (in our case, the forward model for hologram formation) and returns the log likelihood. Ideally we would have a function that outputs apymc
model, with the appropriateOp
set up for calculating the log-likelihood for a forward model that is defined in HoloPy. The user can then augment the model with priors and run samplers using the usualpymc
syntax. - Optimization is a separate problem than sampling. We have lots of code in HoloPy that attempts to find good initial guesses for sampling, or tries to find the maximum of the posterior to provide a point-estimate for parameters. We use CMAES and
scipy.optimize
, for example. I think these algorithms can still be used, but it will require connecting them to thelogp
(log of posterior probability) output from apymc
model. There appears to be a way to calculate thelogp
after the model is specified. I'm not sure how much overhead this will add to the optimization, but in almost all cases our overhead is dominated by the scattering calculation, not the optimizer. Note also thatpm.find_MAP()
can also be used and has several options for optimizers. - Parallelization may be tricky. Here I'm not talking about parallelizing MC chains, but fitting models to multiple frames in a time-series in parallel (on, for example, a computing cluster). In the past, we've had many issues with making sure objects can be serialized for multiprocessing, and I don't know what will be need to be serialized in order to run multiple
pymc
samplers in parallel.
Comments are welcome.
Regarding the last point (parallelization), a new package called UM-Bridge has been released that creates a bridge between numerical forward models (like ours) to pymc
. It enables the use of pymc
on computing clusters. This type of bridge might be the key to complete separating forward modeling (what HoloPy does) from sampling (what pymc
and other packages do).
Documentation: https://um-bridge-benchmarks.readthedocs.io/en/docs/
Use in HPC/slurm: https://um-bridge-benchmarks.readthedocs.io/en/docs/umbridge/hpc.html
Example of use with pymc
: https://um-bridge-benchmarks.readthedocs.io/en/docs/umbridge/clients.html#pymc-client
Announcement on pymc
discourse: https://discourse.pymc.io/t/new-pymcon-talk-enabling-uncertainty-quantification-by-anne-reinarz-linus-seelinger/13583