This project is a spatially explicit simulation of metacommunity dynamics that models habitat filtering and dispersal.
The simulation occurs on a regular grid of cells (termed the 'landscape') and each cell has discrete habitat type, 'A' or 'B'. Each cell contains a 'community' of individuals, which is fixed at a pre-determined carrying capacity. Communities may contain fewer individuals than the carrying capacity, but not more. The species pool consists of three types of species- generalists, and habitat specialists on habitat type A or B. Generalists produce offspring in both habitat types while specialists only produce offspring in their preferred habitat. After initializing a simulation landscape and species pool, the simulation proceeds through repeated iteration of four processes: birth - dispersal - establishment - death. Once established, individuals vacate their place in the community only through dispersal or death- the simulation does not model competitive displacement.
/Code
- ./CTSim: package source code and manual files
- ./HTML: html files for static documentation
- ./Parameters: parameter files for running and summarizing simulations. Subdirectories contain parameter files for different experiments described here. Parameter files for running simulations begin with 'p_' whereas parameter files for summarizing simulations begin with 's_'.
- ./Scripts: miscellaneous scripts for working with CTSim
- debugging.R: code for testing CTSim package
- make_package.R: code for building CTSim package
- make_parmfiles.R: code for generating parameter files used in experiments
- submit_runs.txt: shell code for submitting multiple jobs to cluster
- visualize_simulation.R: code for analyzing CTSim runs and experiments
- Current package version for Windows and Unix
/Results
- ./Plots: visualizations of simulation results, organized by experiment/run
- ./Summary: summary RData files of simulation results, organized by experiment/run
/Code/HTML
Static html help files for every function in the CTSim package are available here.
/Code/CTSim/
Files for installing the package are available in the Code directory. The most recent package version is the one with the highest version number.
- Create a set of simulation parameters and save in a directory (e.g. 'Parms'). See example_parameter_file.txt for an example file and make_parmfiles.R for an example of generating multiple parameter files. All parameter file names must begin with 'p_'.
- Decide on a directory where simulation results should be saved (e.g. 'Results').
- Move the run_simulation.R script from the 'exec/' directory of the CTSim package to your working directory.
- Make sure the CTSim package is installed. If conducting sims on the High Performance Cluster (e.g. Killdevil), then copy the tarball "CTSim_0.1.6.tar.gz" to a directory there, load
R
and typeinstall.packages('CTSim_0.1.6.tar.gz', lib = 'path_to_package', repo = NULL
, where 'path_to_package' is the directory the tarball is saved in. When loading the library, you will similarly need to specify that directory again inlibrary('CTSim', lib.loc = 'path_to_package')
. - Run R in batch mode on N cores with the run_simulation.R script:
R CMD BATCH "--args N Parms/ Results/" run_simulation.R myrun.Rout
NOTE: Simulation progress can be checked by examining the status of the output files ending in '.Rout' using more
.
- If the simulation stops before all runs are complete, restart the simulation without writing over already finished results using the 'restart_simulation.R' script (also found in the 'exec/' directory):
R CMD BATCH "--args N Parms/ Results/" restart_simulation.R myrun_restart.Rout
- Create a summary parameter file (or multiple files) and save in a directory (e.g. 'Parms'). See example_summary_parameter_file.txt for an example file. All summary parameter file names must begin with 's_'.
- Decide on a directory where simulation result summarise should be saved (e.g. 'Summaries').
- Move the summarize_simulation.R script from the 'exec/' directory of the CTSim package to your working directory.
- Run R in batch mode with the summarize_simulation.R script:
R CMD BATCH "--args Results/ Parms/ Summaries/" summarize_simulation.R myrun_summary.Rout
- Examine simulation summaries using your own code, or see visualize_simulation.R for examples of how to load and examine summaries from simulations on different sets of parameters.
A simulation requires three objects: a landscape, a species pool, and a global species abundance distribution (hereafter, gsad). The functions used to initialize these objects are described below.
make_landscape(x, y, mod, d, prop, draw_plot)
A landscape is a raster layer whose values are -1 or 1, corresponding to habitat types 'A' and 'B', respectively.
When a landscape is initialized, the user can specify its size (x,y
), the proportion of the cells which should belong to habitat type 'B' (prop
), a variogram model used to define the spatial autocorrelation of habitat values (mod
), and the distance at which habitat values become uncorrelated (e.g. the range of the variogram model: d
). Variogram models are implemented by the vgm()
function in gstat. Use show.vgms()
to see available models. Only the dimensions of the landscape are required. By default, the function will return a grid with 50% of habitat type 'A' and a exponential variogram model with partial sill = 1 and range = 1/3 of the grids smallest dimension.
make_species(S_A, S_B, S_AB, dist_b, m, r, dist_d, dist_v)
A species pool is a 3-dimensional array of species vital rates. The first dimension specifies the species. The second dimension defines in which habitat the rate applies ('A' or 'B') and the third dimension specifies the type of rate:
- b: birth rate- number of offpsring produced by established individual per timestep.
- m: mortality rate- probability that established individual dies during a time step.
- r: recruitment rate- probability than a propagule arriving at an open space will become an established individual
- d: dispersal rate- expected distance that a propagule will disperse from its origination point
- v: movement rate- expected distance that an established individual will move from its cell
Birth rates are positive in a species' preferred habitat and 0 elsewhere. Generalists prefer both habitat types equally. Dispersal rates control how newly produced propagules move away from their cell of origin. Movement rates control how established individuals move from their current cell. Movement, mortality, and recruitment rates can be set to differ systematically between preferred and non-preferred habitats.
A GSAD is a vector of species relative abundances on the "mainland". These define the relative probabilities that an immigrant from outside the landscape will belong to each species. Implemented distributions are 'same', 'uniform', 'power', 'logseries', 'lognormal', and 'poisson'. See the help file on make_sad
for further details.
populate_landscape(land, species, gsad=NULL, K, distribution=NA, p=NA, which_cells=NULL)
Once a landscape, species pool and GSAD have been generated, a simulation can be run. First, the landscape must be populated by species using the function populate_landscape
. This function has several options for defining the number of individuals in each cell (K
), the proportion of this carrying capacity which should be initially filled (p
), as well as how individuals should be distributed across the landscape (distribution
). The which_cells
option allows the user to specify the exact cells where individuals should be placed and the number to place in each cell. Individuals are drawn probabilistically from the specified GSAD (gsad
), which defaults to the same probability for all species.
The populated landscape is a metacommunity object which takes the form of a matrix of lists. The x
and y
positions define the location of each community on the landscape, while the list defines which individuals are present. The length of these lists are fixed for the remainder of the simulaton to the carrying capacity defined in K
. Integers refer to species identities and a 0
indicates that no individual is present. For example, if metacomm[1,2]
is equal to list(3,3,2,10,0,2,1)
, then the cell in the first row and second column can hold seven individuals and currently contains one individual of species 1, two individuals of species 2 and 3, one individual of species 10, and one empty space.
After a metacommunity has been generated, a simulation can be run for a fixed length of time (steps
) using the run_sim
function. In addition to the objects defining the metacommunity (metacomm
), landscape (land
), species pool (species
), and GSAD (gsad
), the function also requires the probability that an empty space will be colonized by an individual from the "mainland" (imm_rate
; this is a per individual probability which means that for a cell of 100 individuals, an imm_rate
of 0.01 means that one new individual is expected to appear from the mainland per timestep) as well as information about how propagules disperse (d_kernel
) and individuals move after they have established in a cell (v_kernel
). Three dispersal modes are implemented: half-gaussian, adjacent cell, and uniform. In all cases, the direction of dispersal is random (isotropic) and the expected dispersal distances are determined by dispersal and movement rates for individual species defined in the species pool object. See the help file on get_dispersal_vec
for additional information on specifying dispersal and movement parameters.
A simulation runs by iteratively calling the function run_timestep
, which defines the operations that occur in a single timestep. This function calls the main process functions die
, reproduce
, disperse
, establish
to progress a metacommunity through one simulation timestep. Each of these functions are described in detail in their help files. The order of operations is as follows:
- Death: Established individuals in each community experience probabilistic mortality according to species- and habitat-specific mortality rates provided in
species
. - Birth: Established individuals in each community produce propagules according to species- and habitat-specific birth rates provided in
species
. - Dispersal: New propagules from each community disperse across the landscape (
land
) according to species-specific dispersal rates provided inspecies
. The parameterd_kernel
specifies the dispersal kernel for new propagules. - Movement: Established individuals in each community move from their current cell with species- and habitat-specific movement rates provided in
species
. The parameterv_kernel
specifies the dispersal kernel for previously established individuals. - Establishment: Empty spaces in each community are colonized by either a migrant from outside the community with probability
imm_rate
or by an individual selected at random from the pool of new propagules and moving individuals that arrived in the cell. External migrants are chosen probabilistically from the relative abundances given ingsad
.
Once the simulation has run for a fixed number of timesteps, results are returns as an array of lists with three dimensions, where the third dimension indicates the timestep and the other two form a metacommunity object. An optional paramter save_steps
can be used to only record data from specific timesteps, but by default all time steps are returned, including the initial metacommunity. Dimension names record the actual timestep (0
for the initial metacommunity, 1
for the state of the community after one step, etc...). The user can also ask the tha function return an array that keeps track of the rate at which each species is gained and lost in each cell. This is specified through the parameter calc_rates
and if TRUE
, instead of returning a 3-dimensional metacommunity array, the function will return a list whose first element is the metacommunity array and whose second element is an array of species gain/loss rates in each cell, summed across the timesteps indicated in save_steps
.
Most users will primarily want to run simulations using the run_sim_N
and run_sim_P
functions. These functions run multiple simulations on a set of parameters or multiple sets of parameters, respectively. They are useful because the user does not need to initialize simulation objects (landscape, species pool, etc...), as this occurs internally. Results can be returned or saved to a directory.
This function runs multiple independent simulations on a set of parameters given as a list (parms
). A parameter list can be generated by the function make_parmlist
which compiles simulation parameters stored in the current (or specified) environment into a list. See the section on Parameter Files below for a list of parameter objects which are required versus optional.
Each run generates a new landscape, species pool, global species abundance distribution and initial metacommunity. Landscapes, species pools and gsads are saved as lists (object names: lands_N
, species_N
, gsad_N
) in a file named <simID>_simobjects.RData
. Simulations can be run in parallel by specifying nparallel > 1
, which requires the doParallel
and foreach
packages. By default, nparallel = 1
and the simulations proceed serially. Each run of the simulation is temporarily saved to the working directory or permanently saved to the directory specified by save_sim
. If this directory does not exist then it is created. Runs are saved in a subdirectory named simID
as <simID>_run<i>.RData
. This RData file contains five objects:
- results: an array of the metacommunity through time returned by
run_sim
or a list (ifcalc_rates=TRUE
) containing this array and an additional array of species gains/losses in each cell - this_land: the landscape used for the simulation
- this_species: the species pool used in the simulation
- this_gsad: the global species abundance distribution used in the simulation
- this_metacomm: the initial metacommunity
If return_results
is TRUE
, then after all simulations are complete, all runs are read back into memory, compiled arrays, and returned as a list by the function. The componenets of this list are results
, species
, lands
, and gsads
and the first dimension of each array is the run to which it corresponds. For example, results[1,,,]
is the metacommunity from run 1 through time. If return_results
is FALSE
then temporary files from the simulation runs are NOT removed, even if save_sim
is not specified. Users should be careful returning results for large simulations since the results object is generally too large for the working memory. It is better to save the results and then use summary functions (described below) to access them.
run_sim_N
can be used to restart a set of multiple simulation runs, but does not currently allow users to restart a simulation on an existing run. If restart
is TRUE
, the function navigates to the save_sim
directory and searches for the first instance of i <= nruns
where <simID>_run<i>.RData
does not exist. It then starts simulations for all i
that do not have saved files, using the objects saved in <simID>_simobjects.RData
.
By default, run_sim_N
runs in silent mode, but by specifying a number for report
users can request a timestamp to be written to STDOUT every time a fixed number of timesteps have passed. This can be useful for gauging how long simulations are taking. Finally, if CTSim
is installed in a directory not on the default search path, users should indicate where the package is installed using lib_loc
.
run_sim_P(ncores=1, parm_dir='./', results_dir='./Results/', sim_dir=NULL, report=0, restart=F)
This function is a wrapper for run_sim_N
which sequentially calls the function on each parameter file in the directory parm_dir
, defaulting to the working directory. Simulation runs are saved to a subdirectory of results_dir
named simID
, which must be defined in each parameter file. Using the same simID
in multiple parameter files will cause results to be saved over one another. Users can generate parameter files by creating a parameter list using make_parmlist
and then saving this object to a file using write_parms
. For an example see make_parmfiles.R. sim_dir
refers to the directory where CTSim
is installed, if not on the default search path. Specifying ncores > 1
will run the simulations in parallel requesting the defined number of cores.
R CMD BATCH "--args ncores parm_dir results_dir sim_dir report" run_simulation.R outfile.Rout
R CMD BATCH "--args ncores parm_dir results_dir sim_dir report" restart_simulation.R outfile.Rout
Two scripts are provided with the CTSim
package in the exec/
directory which can be used to run simulations in batch mode using R CMD BATCH
(see above for usage). run_simulation.R
calls run_sim_P
on command line arguments and starts new simulations, whereas restart_simulation.R
attempts to restart existing sets of simulation runs. Command line arguments must be speficied in order.
These two functions are used to summarize the results of a single simulation (summarize_sim
) or multiple runs from one set of simulation parameters (summarize_sim_N
). summarize_sim
calculates four main types of community descriptors on a given simulation run for a set of spatial and temporal units and then summarizes these descriptors across spatial and temporal units. A spatial unit is a collection of grid cells and a temporal unit is a collection of timepoints. Thus, by specifying different sets of spatial and temporal units the user can analyize a simulation at different spatial and temporal scales. aggregate_cells
is a useful function for creating sets of spatial units at different spatial scales which can be passed to the locs
argument.
The community descriptors that the function calculates are returned as arrays in a list:
- bio: richness and abundance of biologically core and transient species (based on birth rates)
- occ: richness and abundance of species in classes based on their temporal occupancy
- xclass: number of species in each of four categories: biologically core - occupancy core, biologically core - occupancy transient, biologically transient - occupancy core, and biologically transient - occupancy transient (see cross_classify
- abun: relative rank abundance of each species in the metacommunity
Details on how the summary function proceeds:
- Actual species' abundances are calculated for each spatial unit in
locs
and temporal unit int_window
. See documentation ofcalc_abun_profile
for more information on these parameters. - Observed species' abundance profiles are calculated by sampling actual abundance profiles using the detection probabilities provided in
P_obs
. - Observed abundance profiles are used to calculate species temporal occupancy in each of the spatial units, after aggregating timepoints according to
agg_times
(seecalc_occupancy
). - Richness and abundance of species in each of the temporal occupancy categories defined by breaks are calculated in each spatial unit for each timepoint. Timepoints may be aggregated prior to calculations using the
agg_times
argument insum_parms
. See (seecalc_rich_CT
) for details. - For each spatial unit, habitat values are averaged across the cells that comprise it (see
average_habitat
) and this average habitat type is used to determine which species are biologically core and transient (based on whether they have positive birth rates in that habitat). Richness and abundance of biologically core and transient species is then calculated for each spatial unit at each timepoint. As in the previous step, timepoints may be aggregated prior to calculations using theagg_times
argument insum_parms
. - For each spatial unit, species are classified as core or transient based on whether they fall in the first or last occupancy interval defined by breaks. Then, the number of species classified as biologically core - occupancy core, biologically core - occupancy transient, biologically transient - occupancy core, and biologically transient - occupancy transient are counted for each spatial unit.
- Thus far the simulation has calculated richness and abundance for each timepoint. The element
time_sum
insum_parms
defines which timepoints are used for summarizing richness and abundance. Iftime_sum = 'none'
then all timepoints are summarized individually. Other options are'mean'
for averaging across all timepoints int_window
or'last'
for only using the last timepoint int_window
. - After timepoints have been summarized (in step 7), summary statistics are calculated across all spatial units. Minimally, this is the mean and variance, but may also include any quantiles specified in the
quants
argument ofsum_parms
. - The relative abundance of each species is calculated across the entire metacommunity for each set of observed abundances (from step 2) and these abundances are returned in rank order.
summarize_sim_N
calls summarize_sim
for each independent simulation run. In addition, it also summarizes landscape properties across the spatial units specified in locs
using summarize_land
for the landscapes used in each simulation run. If sum_func
is provided, this function is used to summarize quantities across runs. The package includes a useful default summary function (default_sum_func
) which can be passed to sum_func
. It returns the mean, variance and qunatiles at 2.5%, 50% and 97.5%.
Simulation runs to summarize can be specified in two ways using the parameter sim
: by passing the list of simulation results returned by run_sim_N
or by passing the name of the directory where multiple simulation result files are saved. Simulation run files are identified by ending in 'run<i>.RData'
, where i
is a number. This is the naming format that is automatically generated by run_sim_N.
summarize_sim_P(run_dir = "./", parm_dir = "./", results_dir = "./Summaries/", cross_time = F)
Users can save summary parameters passed to summarize_sim_N
in a parameter file and then use summarize_sim_P
to summarize simulation results using one or more summary paramter files. Here is an exmaple of a summary parameter file. The function uses each summary parameter files in parm_dir
to summarize a set simulation runs saved in the directory run_dir
. Parameter filenames must start with 's_'
and simulation runs should all be from one set of simulation parameters (e.g. as saved by run_sim_N
). Summary parameter files are read into R using source()
and should therefore be R-readable. Each summary parameter file should also have a unique sumID
defined within it as well define objects that can be passed as parameters to summarize_sim_N
. Parameters requiring a value are breaks
, locs
and t_window
. The function will either summarize simulation results for a set time period defined in t_window
(default) or for multiple consecutive time windows across the entire simulation period (use cross_time=TRUE
), in which case t_window
defines the time interval and must be a list with named elements start and stop. If cross_time=FALSE
then two summary objects are saved to the .RData file:
sim_sum_ind
: includes a summary for each runsim_sum
: summarizes quantities across runs using the function defined insum_func
.
If cross_time=TRUE
then only sim_sum_ind
is saved and T=
is appended to the filename to denote the timestep at which the summary ends. For example, a simulation with 100 timepoints and t_window = list(start=91, stop=100)
will conduct summaries for time windows T1 - T10, T11 - T20 ,T21 - T30 , ... T91 - T100, and save each of these summaries to a separate file. In contrast the if the same t_window
is specified, but with cross_time=FALSE
, only the last time window will be analyzed and only one summary file saved.
Results are saved in the directory results_dir
and are not returned by the function. Filenames follow the convention '_summary.RData'
.
Two scripts are provided with the CTSim
package in the exec/
directory which can be used to summarize simulations in batch mode using R CMD BATCH
(see below for usage). summarize_simulation.R
calls summarize_sim_P
on command line arguments with cross_time = FALSE
, whereas summarize_simulation_cross_time.R
calls summarize_sim_P
with cross_time = TRUE
. Command line arguments must be speficied in order.
Usage:
Place the R script in the working directory, then use:
R CMD BATCH "--args ncores parm_dir results_dir sim_dir report" summarize_simulation.R outfile.Rout
R CMD BATCH "--args ncores parm_dir results_dir sim_dir report" summarize_simulation_cross_time.R outfile.Rout
/Code/Parameters/
See example_parameter_file.txt for an example of all possible parameters that can be provided for running a simulation and see baseline_parameter_file.txt for the set of parameters used as a basis for experiments. Required and optional parameters are described below:
The following parameters are required for simulation and must be present or the simulation will fail:
dimX
: x-dimension of landscapedimY
: y-dimension of landscapeS_A
: number of specialist species on habitat type AS_B
: number of specialist species on habitat type Bm_rates
: vector of mortality rates on preferred and non-preferred habitatsr_rates
: vector of recruitment rates on preferred and non-preferred habitatsK
: carrying capacity of cellsnsteps
: number of timesteps to simulatenruns
: number of independent simulations to run
For further details on S_A
, S_B
, m_rates
and r_rates
see make_species
. For further details on K
see populate_landscape
.
The following parameters are optional and more information can be found in the documentation on the functions they are passed to:
- Parameters passed to
make_landscape
vgm_dcorr
: distance at which habitat values become uncorrelatedvgm_mod
: variogram model controling spatial autocorrelation of habitat valueshabA_prop
: proportion of landscape that comprised of habitat type A- Parameters passed to
make_species
S_AB
: number of generalist speciesdist_b
: list defining the distribution from which species' birth rates are sampleddist_d
: list defining the distribution from which species' dispersal rates are sampled. Must contain character string namedtype
.dist_v
: list defining the distribution from which species' movement rates are sampled. Must contain character string namedtype
.dist_gsad
: list defining distribution from which global species abundances are sampled or'b_rates'
, indicating that the gsad should match species birth rates in their preferred habitat- Parameters passed to
populate_landscape
prop_full
: proportion of the landscape's carrying capacity that should initially contain individualsinit_distribute
: character string indicating how individuals should be initially distributed across the landscapecells_distribute
: ifinit_distribute
is'designated'
, a matrix giving the locations of cells in which to place propagules- Parameters passed to
run_sim
d_kernel
: list defining the shape of the dispersal kernel of new propagulesv_kernel
: list defining the shape of the movement kernel of established individualsimm_rate
: immigration rate- probability than an empty space will be colonized by a migrant from outside the metacommunity- Parameters passed to
run_sim_N
save_steps
: vector of timesteps to save in each simulationsimID
: character string that identifies simulations run on this set of parameters
/Code/Scripts/
- make_package.R: R code for compiling the package. Should be run to create new package versions and update documentation.
- make_parmfiles.R: R code for generating simulation parameter files used in the experiments
- visualize_simulations.R: R code for analyzing experiment results and generating visualizations
- submit_runs.txt: fragments of shell code used to submit simulation batches to computing cluster
- debugging.R: fragments of R code used in debugging CTSim functions
/Results
This directory holds the summary data files and visualizations from Experiments performed with CTSim. Actual data files are too large to store in this repository, but are saved in an untracked 'Data' directory here.
See wiki page for details on parameters used in each experiment and naming conventions. In general, files are named according to parameters that were varied. A '_' separates different parameters while a '-' associates a parameter with a value. For example a file including the string: 'd-g1_v-a0.5' indicates that the data refers to a simulation where the juvenile dispersal kernel was set to g1 (gaussian with distance 1) and the adult movement kernel was set to a0.5 (adjacent cell with probability 0.5).
- CONV: Initial runs testing for long-term convergence of simulation under different grid sizes (32x32 vs 64x64) and dispersal types (d) or birth rates (b). These were just used to determine that 200 timestes were adequate and shouldn't be used for further analysis.
- EXP1: Experiment 1 evaluates effects of juvenile dispersal (d), adult movement (v), and habitat spatial autocorrelation (dcorr)
See wiki page for specific parameters used in summaries or see the Parameters directory that corresponds to the experiment.