Correlation between estimated stime and ptime

Question

Correlation between estimated stime and ptime

Closed this issue 2 years ago · 9 comments

See scripts/test_inftime_exp.R. I started thinking about whether uniform distribution is a reasonable prior.

Red points are true values. Black points are estimated. Obviously, we don't have any other information to constrain ptime or stime so it makes sense that we get the prior back (basically). But we're actually not getting the priors back. You might notice that the estimated ptime is slightly biased towards left and the estimated stime is slightly biased towards right.

More importantly, we find a negative correlation between stime and ptime (why???). Note that the true ptime and stime should be uncorrelated:

This also means that the estimated stime-ptime is typically >0.

~~I simulated these with r=0.2, meanlog = 1.8, sdlog = 0.5 and am getting meanlog = 1.85 (1.75--1.98) and sdlog = 0.45 (0.39--0.52). Slight bias in the estimate of meanlog caused by the correlation. I'm guessing the bias in sdlog is also caused by the same issue. ~~ Actually, I have no idea what I simulated these with. I tried running this again and now not getting bias in sdlog. Different seed maybe?? meanlog is still biased though.

More to learn..

Answer 1 · 2022-11-01T06:32:12.000Z

Tried an exponential simulation without truncation. Getting very strong correlations:

Do we need to think about reparameterizing? I don't know if it will improve estimates... I also tried reparameterizing in terms of ptime and delay but still giving strong negative correlations.

Answer 2 · 2022-11-01T10:47:11.000Z

Interesting. First plot is very joy division esk. I think we might need to use some simulation-based calibration (i.e simulating multiple multiple samples from the prior and checking coverage) to get a more robust grip on this.

The correlation doesn't seem ideal but I guess makes sense given we are trying to fit a fixed distribution to a population each of which has a latent parameter. If we keep that distribution fixed and change ptime then the easiest change overall is to change stime?

Answer 3 · 2022-11-01T10:48:19.000Z

I think parameterising could make sense (though ideally we want to keep this method as it is widely used in the literature...). The reparameterisation you have tried seems like the obvious one but I think doesn't change the shape of the posterior enough to prevent the negative correlation from happening (well obviously given it didn't work).

A potential solution would be to sample from the uniform priors and then fit the model for each sample as a truncated but continuous model? That will be very computationally expensive and not ideal.

I think the better solution to suggest is to provide more other information on when am event is likely to occur (i.e by having a transmission process to inform the prior). I'm not sure we should attempt to solve that here vs just pointing it out.

Answer 4 · 2022-11-01T10:55:23.000Z

Have you explored what happens in a zero-growth setting where the uniform prior is correct?

But we're actually not getting the priors back. You might notice that the estimated ptime is slightly biased towards left and the estimated stime is slightly biased towards right.

For your first figure how does this match up with your simulated data? Is the bias in the direction of the bias in the data or unrelated?

Answer 5 · 2022-11-02T03:30:18.000Z

I think we might need to use some simulation-based calibration (i.e simulating multiple multiple samples from the prior and checking coverage) to get a more robust grip on this.

Agreed.

A potential solution would be to sample from the uniform priors and then fit the model for each sample as a truncated but continuous model? That will be very computationally expensive and not ideal.

Definitely not ideal and computationally expensive. Also, posterior samples for each sample would be associated with different posterior distribution probabilities, which we need to account for. And the current method is already doing that. So maybe this correlation is intrinsic to the problem.

I think the better solution to suggest is to provide more other information on when am event is likely to occur (i.e by having a transmission process to inform the prior). I'm not sure we should attempt to solve that here vs just pointing it out.

This is possible but difficult. I don't think we should be too worried about not being able to estimate each event time accurately. As long as we're doing OK on average, we should be OK.

For your first figure how does this match up with your simulated data?

Doesn't match up as far as I remember. But more simulations coming soon.

Have you explored what happens in a zero-growth setting where the uniform prior is correct?

I think the uniform prior might not be actually correct in this setting. More coming soon.

Answer 6 · 2022-11-08T13:22:04.000Z

I just updated and reran this to get the following:

So it seems like recent updates have reduced by not entirely mitigated this. Oddly there now appears to be "banding" in the correlation plot.

Answer 7 · 2023-01-16T16:41:05.000Z

Where are we with this? I think this is at the add as a discussion piece stag?

Answer 8 · 2023-01-17T08:48:06.000Z

This is partly covered in #21 and #27.

Otherwise, it doesn't seem like there's a way to get rid of this. Definitely should be discussed in the paper. But could also be included in the main multi-panel figure explaining issues with censoring.

Answer 9 · 2023-02-21T19:03:00.000Z

This has been implemented into the paper as a figure so closing.