Refactor infection and contact simulation into joint process
Closed this issue · 2 comments
Following on from issue #35, it was decided to reformulate how the infections and contacts would be simulated for {simulist}.
Instead of using a single-type branching process (bpmodels::chain_sim()
), which currently can only provide infected individuals. We will simulate contacts and individuals together, using a contact distribution and a probability of infection. This simulation will take into account the network effects outlined in the first comment of #35.
Given the fundamental nature of this issue to the simulated data output by {simulist} functions we are rearranging the priority tasks for the next release. This issue will now be the primary target for v0.2.0.
I think the approach we take here will depend on what we envisage {simulist} being used for, as some solutions could be quite time consuming. It seems there are two main approaches we could take:
-
Simulate the full contact network and accompanying secondary cases. This is the only way to fully account for network effects and get correct distributions/clustering effects etc. There are implements of this already, e.g. covidhm, an extension of ringbp. But this is a more computationally intensive model and still requires assumptions about the underlying network.
-
Simulate transmission using a branching process and make some reasonable approximations about the contact distribution that would meet user needs for simulist (e.g. maintain certain distributional properties, like the correct contact and offspring distribution, and ensure contacts > transmission for each individual, with the relationship between them following a binomial distribution with p=secondary attack rate).
There are at least a couple of ways of doing (2):
A. Simulate transmission events as in the current version, then generate contacts for each case based on individual transmission events and the secondary attack rate (because we can think of a transmission event as a draw from a binomial with p=SAR, individuals who cause more transmission are likely to have more contacts). So if A is the number of transmissions, I think number of contacts N will have a pmf equal to the weighted probabilities of the binomial probability for each possible N_i , i.e. P(A | N_i ,p). Then can just check that the resulting simulation gives a sensible match to the overall contact distribuiton (which is should if both contacts and offspring are negative binomial and SAR is sensibly specified).
B. Use the branching process model to simulate contacts rather than transmission. Then once we have this contact network, have some post-processing to calculate which contacts became infected (and hence which of their contacts might have become infected etc.) This approach would require some additional book-keeping, and wouldn't account for network clustering, but should give sensible distributions. Simulating individual contacts from a distribution, then simulating random transmissions among these contacts based on SAR is something we did at individual level here (splitting by household and non-household given higher SAR in HH): https://github.com/adamkucharski/2020-cov-tracing.
I'd recommend (A) above as a first pass that will fulfil many user requirements (and direct them to something like covidhm for more complex requirements), but @sbfnk may have comments here.
@adamkucharski and I have discussed this issue and decided to implement a simple branching process with network effects from sampling the excess degree distribution and book-keeping all the contacts and infections in a single process in {simulist} (as stated in #35, where this code lives long-term is not yet decided).
Development for future versions of {simulist} will aim to accept individual-level transmission simulations from a variety of models/packages, e.g., {ringbp} and {covidhm}. This relates to the architecture of the package and will be tackled separately to the new simulation and in a future version.