SLiM engine truncates the number of diploids to match generation 1
Closed this issue · 3 comments
I am simulating the OutOfAfrica_3G09
model, which suggests that the maximum population size of the CEU population at time 0 is 29725 diploids:
import stdpopsim
species = stdpopsim.get_species("HomSap")
model = species.get_demographic_model("OutOfAfrica_3G09")
for pop, size in zip(model.populations, model.model.debug().population_size_trajectory([0])[0]):
print("At t=0", pop.name, "is predicted to be of size", size)
> At t=0 YRI is predicted to be of size 12300.0
> At t=0 CEU is predicted to be of size 29725.343546388514
> At t=0 CHB is predicted to be of size 54090.331077946525
However, if I ask for 29640 CEU samples at time 0, I only get back 29607 diploids:
samples = {'YRI': 1248, 'CHB': 312, 'CEU': 29640}
engine = stdpopsim.get_engine("slim")
contig = species.get_contig("chr17")
ts = engine.simulate(model, contig, samples, seed=123, slim_burn_in=10) # takes a few hours
print(len(ts.samples(population=[p.id for p in ts.populations() if p.metadata['name']=="CEU"][0]))//2, "CEU diploids")
>>> 29607 CEU diploids
This number is suspiciously close to the number of individuals predicted in generation 1:
for pop, size in zip(model.populations, model.model.debug().population_size_trajectory([1])[0]):
print("At t=1", pop.name, "is predicted to be of size", size)
> At t=1 YRI is predicted to be of size 12300.0
> At t=1 CEU is predicted to be of size 29606.679658197816
> At t=1 CHB is predicted to be of size 53793.650875455634
I thought that SLiM warns when it can't create all the requested samples, but I can't see anything in the error output.
Well let's see - we should be catching this error here.
Okay, thinking out loud. Notes:
We have two time scales:
- years ago:
_T
,T_start
, first column ofsubpopulation_splits
- generation number since the start of the simulation (
tick
):G
,G_start
,G_end=max(G)
Relevant code:
// The end of the burn-in is the starting tick, and corresponds to
// time T_start. All remaining events are relative to this tick.
N_max = max(N[0,0:(num_populations-1)]);
G_start = 1 + asInteger(round(burn_in * N_max));
T_start = max(_T);
G = G_start + gdiff(T_start, _T);
G_end = max(G);
// Return the number of generations that separate t0 and t1.
function (integer)gdiff(numeric$ t0, numeric t1) {
return asInteger(round((t0-t1)/generation_time/Q));
}
Conversion to tick
in SLiM is done for instance like:
g = G_start + gdiff(T_start, sampling_episodes[2,i]);
TODO:
- every time we use
gdiff( )
it must be likeG_start + gdiff(T_start, x)
; absorb the_start
things into the def'n record things innever mind, this is for a good reasonG
units notT
in first column ofsubpopulation_splits
Okay, here's a MWE:
import msprime
import stdpopsim
species = stdpopsim.get_species("HomSap")
engine = stdpopsim.get_engine("slim")
contig = species.get_contig("chr17", left=1e7, right=1e7+1e4)
mm = msprime.Demography()
mm.add_population(initial_size=100, name="pop_0", growth_rate=0.01)
mm.add_population_parameters_change(time=10, growth_rate=0.0)
model = stdpopsim.DemographicModel(
id='test',
description='foo',
long_description='fooooooo',
generation_time=3,
model=mm,
)
samples = {'pop_0' : 100}
ts = engine.simulate(model, contig, samples, seed=123, slim_burn_in=0.0)
This errors with
ERROR: Request to sample 100 individuals from p0 (pop_0) at tick 91, but only 99 individuals will be alive.
...and if you remove the code from the script that throws that error and run it in SLiM, you discover that actually only 98 individuals will be alive at the end.
Ah-ha. From the SLiM help:
– (void)setSubpopulationSize(integer$ size)
Set the size of this subpopulation to size individuals (see the SLiM manual for further details). This will take effect when children are next generated; it does not change the current subpopulation state.
So, although we set the subpopulation size in a growing population to be the correct value in the final tick, that doesn't take effect until the next tick (which doesn't happen).