Figure out how to get consistent delG predictions

Question

Figure out how to get consistent delG predictions

sonyahanson opened this issue 8 years ago · 10 comments

I'm currently writing a interface that quick analyses, stores, and makes a figure of results for bayesian analysis, and noticed that for our the simple pymc model function, we aren't getting consistent results when rerunning the analysis:

from assaytools import pymcmodels
            pymc_model = pymcmodels.make_model(Pstated, dPstated, Lstated, dLstated, 
               top_complex_fluorescence=reorder_protein,
               top_ligand_fluorescence=reorder_buffer,
               use_primary_inner_filter_correction=True, 
               use_secondary_inner_filter_correction=True, 
               assay_volume=assay_volume, DG_prior='uniform')

I mentioned these inconsistencies in my last labmeeting, and this is a big reason why I've started working on #56, but just thought I'd post it here as well, since it's coming up again.

In the image below the two adjacent plots are from the same dataset, but the analysis is done at different times (notice time stamp). The plotting is slightly different (delG in title vs. in legend) as this is what I was playing with when repeating the analysis. Seems to be fine for Bosutinib Isomer, but not for Erlotinib (sorry the image is a bit fuzzy...).

Currently parameters for MCMC sampling are:

niter = 500000 # number of iterations
nburn = 50000 # number of burn-in iterations to discard
nthin = 500 # thinning interval

Answer 1 · 2016-06-30T22:37:03.000Z

I'm supre-confused as to why the results are inconsistent for erlotinib. There's no way the average DeltaG is -11.5 kT in that lower right panel---just look at the histogram!

I wonder if the maximum likelihood estimate---rather than the mean of the MCMC sampler history---is being reported as -11.5?

Answer 2 · 2016-07-28T16:05:16.000Z

So in this plot is:
plt.hist(mcmc.DeltaG.trace(), 40, alpha=0.75, label="DeltaG = %.1f +- %.1f kT"%(DeltaG, dDeltaG))
where

            DeltaG = map.DeltaG.value
            dDeltaG = mcmc.DeltaG.trace().std()

I chose this because it is what's currently used in show_summary.

If I change to DeltaG = mcmc.DeltaG.trace().mean() this seems improved.

Answer 3 · 2016-07-28T16:38:33.000Z

The MAP is not a robust statistic if the posterior is multimodal. Let's go with the mean as a more robust statistic.

Answer 4 · 2016-07-28T17:36:51.000Z

Should I also change this in show_summary, was there a reason for choosing map there?

Answer 5 · 2016-07-28T17:38:04.000Z

How about we show both MAP and mean there?

Answer 6 · 2016-07-28T17:38:58.000Z

What does MAP add out of curiosity?

Answer 7 · 2016-08-01T17:40:18.000Z

Finally made a pull request with these choderalab/fluorescence-assay-manuscript#6

Not so bad:

Answer 8 · 2016-08-01T18:08:46.000Z

What does MAP add out of curiosity?

Sorry for missing this earlier! MAP is similar to the solution that you would get from a standard likelihood-maximization scheme, though it's only a unique solution if the posterior is unimodal (which it may not be with something this complicated). But it's something like a "traditional" solution. It may not add much for us because our posterior may be more complicated---the mean may be much more robust.

Answer 9 · 2016-08-01T18:09:50.000Z

Finally made a pull request with these choderalab/fluorescence-assay-manuscript#6

These are looking pretty good! We could do more sampling to smooth out those histograms, or use a kernel density estimate with an adaptive bandwidth, and maybe discard a bit more to equilibration. (Are we using automatic equilibration detection yet, or is the burn-in period fixed?)

Answer 10 · 2016-09-19T20:50:39.000Z

So this definitely seems to be more of a problem for the spectra assays than the singlet assays: