CobayaSampler/cobaya

Inconsistent Temperature error when trying to restart a MCMC run

Closed this issue · 5 comments

Hi,

I am having an isssue when trying to resume a MCMC run with Cobaya. When it loads the previous samples, there is an error message telling me [0 : samplecollection] *ERROR* The sample seems to have an inconsistent temperature. , and then the process is stopped.

I do not understand very well what is the temperature in practice, and why it prevents cobaya from running more MCMC?

Also it could be that this inconsistent temperature comes from the first rows of the chains, that we could choose to discard at loading?
When cobaya tries to resume it prints [0 : samplecollection] Skipping 0 rows
So maybe there could be a way to discard the first rows when loading the samples to resume a run, and avoid the error about the temperature?

I don't know if that make sense.

Bests,
Louis

cmbant commented

Could you attach files, or try looking at the values in collection.compute_temperature that are raising the error? (maybe allclose is too stringent after chains have been loaded from files stored with limited precision, could try changing the optional precision arguments?)

Hi Antony,

It seems that the temperature is around 1 \pm 5e-4 for some of the chains, I am sending the files attached.
If I put np.allclose(temp, temp[0], rtol=1e-3) here, then the test is passed.

I see that the stored values of logpost, loglike and logprior have an 8 digit precision.
The precision on temperature should then be
d temp / temp = 3*precision
That should be 3e-8, which is much lower than the 5e-4 values that are given when I load the chains, I dont know why it is so different.

03_param_qe_tt_n32_tot_desi_cmb.zip

[edited because of a mistake in the estimate of the error]

cmbant commented

Thanks, I'll have a look. Btw, you should update CAMB to get the fix in v1.5. (some of your chains seem to be stuck; usually it's also not necessary to do more than 4 chains)

cmbant commented

I've just relaxed the test precision for now (4d7cecf). Let me know if any of the other tests have issues. When he's back from holidays @JesusTorrado should probably also check over precision of these tests in more detail.

Thanks, it all works fine for me with this lower precision, other tests are fine.