SLiM can now read, and write tree sequences, which store genealogical data of populations.
SLiM can use a tree sequence produced by the coalescent simulator msprime
to initialize
a simulation, but to do so we need to add the relevant metadata. SLiM can also write out
history as a tree sequence, and in so doing it records extra information in metadata.
This package makes it easy to add the relevant metadata to a tree sequence so that SLiM
can use it, and to read the metadata in a SLiM-produced tree sequence.
The SLiM manual documents how the extra metadata is stored in the tree sequence files, and provides additional examples of how to use this package.
To install pyslim
, do
git clone https://github.com/tskit-dev/pyslim.git
cd pyslim
python setup.py install --user
You should also be able to install it with pip install pyslim
.
You'll also need an up-to-date msprime and SLiM.
To run the tests to make sure everything is working, do:
python -m nose tests
Note: if you use python3
you may need to replace python
with python3
above.
Coalescent simulation is more biologically limited, but is much faster than
forwards simluation, so it can be helpful to run a "hybrid" simulation, by
endowing a SLiM simulation with a history derived from a msprime simulation.
The pyslim.annotate()
command helps make this easy, by adding default
information to a tree sequence, allowing it to be read in by SLiM. This will
simulate a tree sequence with msprime, add SLiM information, and write it out
to a .trees
file:
import msprime
import pyslim
# simulate a tree sequence of 12 nodes
ts = msprime.simulate(12, mutation_rate=1.0, recombination_rate=1.0)
new_ts = pyslim.annotate_defaults(ts, model_type="nonWF", slim_generation=1)
new_ts.dump("initialize_nonWF.trees")
The resulting file slim_ts.trees
can be read into SLiM to be used as a starting state,
as illustrated in this minimal example:
initialize()
{
setSeed(23);
initializeSLiMModelType("nonWF");
initializeTreeSeq();
initializeMutationRate(1e-2);
initializeMutationType("m1", 0.5, "f", -0.1);
initializeGenomicElementType("g1", m1, 1.0);
initializeGenomicElement(g1, 0, 99);
initializeRecombinationRate(1e-2);
}
1 early() {
sim.readFromPopulationFile("initialize_nonWF.trees");
}
10 {
sim.treeSeqOutput("nonWF_restart.trees");
catn("Done.");
sim.simulationFinished();
}
See the SLiM manual for more about this operation.
Although we can initialize a SLiM simulation with the results of a coalescent simulation, if during the simulation we don't actually use the genotypes for anything, it can be much more efficient to only coalesce the portions of the first-generation ancestors that have not yet coalesced. (See the SLiM manual for more explanation, and the upcoming paper.) Doing this is as simple as:
ts = pyslim.load("unfinished.trees")
ts.recapitate(recombination_rate = 1e-6, Ne=1000)
If you have recorded a tree sequence in SLiM, likely you have not included any neutral mutations.
To add these (in a completely equivalent way to having included them during the simulation),
you can use the msprime.mutate( )
function, which returns a new tree sequence with additional mutations.
This works as follows:
ts = pyslim.load("unmutated.trees")
mut_ts = pyslim.SlimTreeSequence(msprime.mutate(ts, rate=1e-8, keep=True))
This will add infinite-sites mutations at a rate of 1e-8 per site, and will
keep
any existing mutations (and not add any new ones to the sites where they
exist already). We have wrapped the call to msprime.mutate
in a call to
pyslim.SlimTreeSequence
, because msprime.mutate
returns an msprime tree sequence,
and by converting it back into a pyslim
tree sequence we can still use the methods
defined by pyslim
. (The conversion does not modify the tree sequence at all,
it only adds the .slim_generation
attribute.) The output of other msprime
functions that return tree sequences may be converted back to
pyslim.SlimTreeSequence
in the same way.
To retrieve the extra information that SLiM stores in a tree sequence, use the extract_X_metadata()
functions, where X
is one of mutation
, population
, node
, or individual
.
For instance, to see the age of each Individual produced by annotation
in the previous example:
for ind in pyslim.extract_individual_metadata(new_ts.tables):
print(ind.age)
In this example, all the ages are 0 (the default).
To modify the metadata that pyslim
has introduced into a coalescent simulation,
or the metadata in a SLiM-produced tree sequence, use the annotate_X_metadata()
functions.
For instance, to set the ages of the individuals in the tree sequence to random numbers between 1 and 4,
and write out the resulting tree sequence:
import random
tables = new_ts.dump_tables()
ind_md = list(pyslim.extract_individual_metadata(tables))
for ind in ind_md:
ind.age = random.choice([1,2,3,4])
pyslim.annotate_individual_metadata(tables, ind_md)
mod_ts = pyslim.load_tables(tables, slim_format=True)
for ind in pyslim.extract_individual_metadata(mod_ts.tables):
print(ind.age)
mod_ts.dump("modified_ts.trees")
Here we describe the technical details.
Currently, the python package msprime
simulates tree sequences
and provides tools to work with them.
In the future, tools for working with tree sequences will be separated
into a package called tskit
.
pyslim
provides a thin interface between msprime/tskit
.
SLiM records additional information in the metadata columns of Population, Individual, Node, and Mutation tables.
The information is recorded in a binary format, and is extracted and written by pyslim
using the python struct
module.
Nothing besides this binary information can be stored in the metadata of these tables for the tree sequence to be used by SLiM,
and so when pyslim
annotates an existing tree sequence, anything in those columns is overwritten.
For more detailed documentation on the contents and format of the metadata, see the SLiM manual.
-
tskit
"nodes" correspond to SLiM "genomes". Individuals in SLiM are diploid, so each has two nodes. -
Since in SLiM, all individual are diploid, every individual will be associated with two nodes. The Individual table contains entries for
a. the currently alive individuals, b. any individuals that have been remembered with
treeSeqRememberIndividuals()
, and c. the first generation of the SLiM simulation.This last category is here because they are necessary for recapitation (described above); but they are not marked as samples, so will most likely be removed if you
simplify
the tree sequence. -
The "remembered nodes" will be the first nodes.