mackelab/delfi

Questions / enhancement proposals

jajcayn opened this issue · 6 comments

Dear developers,

I started playing around with delfi after I heard a wonderful talk by Jacob Macke on neuromatch. I ran the tutorial and went over the code and I have a couple of questions/enhancement proposals:

  • not sure what are your plans for the future of delfi but as you probably already know, theano is deprecated thus for the future maintenance you should switch your computational backend. Do you have any plans to do that? I worked a bit in the past with MDN in tensorflow 2+ and they are very easy to do.. also, tensorflow-probability implements masked autoregressive flow which should be directly usable. I am saying this because even today I had problems installing theano on both macos and ubuntu, hence it's starting to be the problem.

  • when I try to plot, I am getting errors, because matplotlib switched normed=True in histograms to density=True, I was able to fix it in my local install, would you like me to do a PR?

  • it would be really helpful if there would be an option to save the posteriors and/or the trained weights of the network? in many cases, I would like to train the net to get the posterior but then have the possibility to get back to my posteriors for plotting purposes, or to get back to the saved state of the net in order to train it a bit more.

  • I wanted to plot my posterior using viz.plot_pdf when I used SNPEC with MAF, and although the MAFconditional object resembles distributions due to gen and eval methods, the eval method has different calling signature, and thus cannot be plotted using plot_pdf. Is this expected?

Thanks a lot,
best regards!

N.

Hi Nikola, it's really great that you liked Jakob's talk!

In fact your main wish may become true rather soon :), albeit with PyTorch rather than Tensorflow. Have a look at our brand new package SBI! You'll be happy to hear that, for flows, we build on the extensive and excellent work by C. Durkan, A. Bekasov, I. Murray and G. Papamakarios.

I will defer to other members of the team for the other points. We're generally very keen on usage reports, PRs and general comments of how it's working for your research, so keep them coming!

Thanks,

Álvaro.

Hi Nikola,

  • a PR would be great!
  • this is great feedback, we'll try to incorporate this into the sbi package Álvaro mentioned above. Thanks!
  • for plotting with flows, please use samples_nd(). Unlike plot_pdf(), it does not use the eval() method, but just requires you to provide samples from the posterior. The reason that plot_pdf() is not working with flows is that we are plotting the marginals (and pairwise marginals). While it is simple to get the marginals of high-dimensional distributions for mixtures of Gaussians, it can not be done for flows. See this tutorial for an example on how to use samples_nd().

Best
Michael

Hey guys, thanks a lot for your answers :)

I just submitted a PR for the normed -> density problem.
And thanks for pointing me out to your new SBI package! I am stoked to try it out, but I guess I'll wait for the official release - in the meantime I'll stick to using delfi.

One last question :)
I successfully ran inference on one of my models and I just started to analyse the results. (Btw, for the time being, I hacked saving with just pickling everything after the inference on a remote machine and unpickling in my computer). I realised that I would like to also save the runs of the forward models (meaning the original timeseries), resp. the dictionary that the BaseSimulator instance is expected to return. I was looking for a place in the code where the actual model is run during the inference and found this:

result = self.model.gen(params_batch, n_reps=n_reps, pbar=pbar)

is this the correct place? So the result is actually the dictionary from my Simulator?

If so, I'll try to hack my way around and save the timeseries, e.g. using PyTables to HDF file. I was also thinking of not saving ALL of the models runs, but just some of them (when e.g. the stats are way off my desired ones, so maybe I'll try to add some filtering to this). If you have any pointers or ideas on how this can be done, I'll be grateful for your help. Also, if I manage to somehow do it and if this would be something you'd like to see in delfi I can open a PR with it :)

Thanks again for the superb job! :)
N.

Thanks for the PR, I just merged it :)

The line of code you linked is somewhat correct, in that result will be a list of these dictionaries. Note that self.model.gen() takes a batch of parameter sets which are then looped over. This loop is happening here:

for param in params_list:

Since you want to store specific simulations, I think you have (at least) three options:

  1. Filter and store the simulations somewhere within this loop.
  2. add the code to filter and save timeseries to the gen_single() function you had to provide to delfi. The gen_single() function will be called in the loop I mention above and everything should be taken care of.
  3. loop over the result variable you mention above and pick the simulations you like ;)

Does that make sense?

Michael

hey,
thanks for the pointer. I am finding out is a bit more complicated :) If I do not want to save each run in specific file (which is like the last resort) there are some problems:

  1. I cannot save in Simulator loop since I wanted to filter based on statistics from SummaryStats class
  2. in my Simulator's gen_single is tricky when I am using multiprocessing... it's hard to not mess up when you open a file (e.g. h5) and then save some arrays there in parallel. it never works
  3. over the result variable in Generator it will work only in single-thread case, in the parallel with MPGenerator I'd need to also send results to pipe, not only stats :)

But I'll figure something out:) The worst-case scenario, I just save one file per simulation and have a lot of them and then I'd stitch them;)

Thanks again!
Shall I close this issue?

Ah, alright, I see. Sorry I couldn't help more, hope it works out!

Michael