Priesemann-Group/covid19_inference

Add sampling with dense mass matrix

Closed this issue · 14 comments

For a better performance, it could be usefull to sample with a full mass matrix, instead only of the diagonal. The newest not yet released version (3.9) of PyMC3 has it as option in pm.sample, but the different functions needed for it are already in earlier releases.

The goal would be to implement the sampling that works in PyMC 3.7 and 3.8 and test whether it is more performant (higher effective number of samples).
References to get started:
pymc-devs/pymc#3596
pymc-devs/pymc#3845
https://dfm.io/posts/pymc3-mass-matrix/

The way to go would be:

  • Make a new example notebook, first as copy of the example_one_bundesland.ipynb
  • Test whether sampling with dense weight matrix is better. The effective sample size is the relevant statistic (pm.stats.ess)
  • If yes, test it for the example_bundeslaender.ipynb.ipynb
  • If it is also better, change the example_notebooks to use the full dense matrix.

I will try that.

Perfect

I tried it in the one_bundesland notebook in my fork: https://github.com/emilIftekhar/covid19_inference
One can open the netCDF output files with xarray.open_dataset()

Thanks, could you make some plots, to compare the different ess of the variables?
Perhaps a bar plot for every variable? At least for the ones that aren't near 500 samples.
Like this, it is very difficult to compare them...
Otherwise it would be perhaps the most reasonable to first test the latest master of pymc3, as they probably improved the version to the one that is published in the blog post.

How urgent do you need it? If it is ok, I would first tackle some of my other tasks today. I would probably get to this issue again tomorrow.

No, it isn't so urgent. And yes, this issue takes a bit of time to make it right

In order to try the module from master repo, I have created a new environment on my computer, cloned the repo and then installed it. But now my jupyter notebook has trouble importing from pymc3. Do you know what could have gone wrong?

ImportError: cannot import name 'Model' from 'pymc3' (unknown location)

Where is my new pymc3 folder supposed to be? Maybe that is the problem. I cloned it into the main covid19_inference directory.

Ok reinstalled it, but the full mass matrix option does not seem to work yet.
It is giving me some LinAlgError.

`~/anaconda3/envs/githubPymc3/lib/python3.8/site-packages/scipy/linalg/decomp_cholesky.py in _cholesky(a, lower, overwrite_a, clean, check_finite)
37 c, info = potrf(a1, lower=lower, overwrite_a=overwrite_a, clean=clean)
38 if info > 0:
---> 39 raise LinAlgError("%d-th leading minor of the array is not positive "
40 "definite" % info)
41 if info < 0:

LinAlgError: 2-th leading minor of the array is not positive definite`

Should I get back to doing it manually or do you think it is worthwhile to keep trying it with the new module?

Mmh, it could also be that this error is due to our model. That some gradient can't be calculated. You could first try, whether a model without change points works. These are the tricky bits

With the new make_I_prior the correlation between variables is in general relatively low in the models. As such, one wouldn't gain much by a dense matrix. Closing it for now