pSIMS is a suite of tools, data, and models developed to facilitate access to high-resolution climate impact modeling. This system largely automates the labor-intensive processes of creating and running data ingest and transformation pipelines and allows researchers to use high-performance computing to run simulations that extend over large spatial extents, run for many growing seasons, or evaluate many alternative management practices or other input configurations. In so doing, pSIMS dramatically reduces the time and technical skills required to investigate global change vulnerability, impacts and potential adaptations. pSIMS is designed to support integration and high-resolution application of any site-based climate impact model that can be compiled in a Unix environment (with a focus on primary production: agriculture, livestock, and forestry).
For more information about pSIMS, please see the following paper:
Elliott, J., D. Kelly, J. Chryssanthacopoulos, M. Glotter, Kanika Jhunjhnuwala, N. Best, M. Wilde, and I. Foster, (2014). The Parallel System for Integrating Impact Models and Sectors (pSIMS). Environmental Modeling and Software: Special Issue on Agricultural systems modeling & software. Available online, May 22, 2014. http://dx.doi.org/10.1016/j.envsoft.2014.04.008
Package | Location | Type |
---|---|---|
APSIM | https://www.apsim.info | Crop model |
Boost | http://www.boost.org | Required to run APSIM |
CenW | http://www.kirschbaum.id.au/Welcome_Page.htm | Generic forestry model |
DSSAT | http://dssat.net | Crop model |
Mono | http://www.mono-project.com | Required to run APSIM |
nco 4.4.3 | http://nco.sourceforge.net | Required for postprocessing |
netcdf4 | https://www.unidata.ucar.edu/software/netcdf/ | Required |
netcdf4 python libraries | https://github.com/Unidata/netcdf4-python | Required |
Oracle Java 7 | http://www.oracle.com/us/downloads/index.html | Required |
Swift 0.95 | http://swift-lang.org | Required |
In addition to installing these packages, there are also a number of python modules that must be installed. These are defined in pysims/requirements.txt. To install these packages in an automated way, run the command "pip install -r requirements.txt" within a Python virtual environment. For more information on Python virtual environments, please see http://docs.python-guide.org/en/latest/dev/virtualenvs.
Simulating a single tile is useful for testing purposes. It allows you to verify that your parameters are set correctly and to verify the simulation results looks reasonable. Create a new directory and change the The command for running a single point simulation is:
Usage: pysims.py --campaign <campaign_dir> --param <param_file> --tlatidx <tile_latitude_index> --tlonidx <tile_longitude_index> [ --latidx <point_latitude_index> --lonidx <point_longitude_index> ]
If a point latidx and lonidx is specified, only a single point will be simulated rather than all points in the tile.
In most cases you'll want to simulate a group of tiles. Since this can be computationally expensive, this type of simulation will typically be done on a cluster or supercomputer. To accomplish this, pSIMS uses the Swift parallel scripting language. The "psims" script is a shell script used to start the simulations.
Usage: ./psims -s <sitename> -p <paramfile> -c <campaign> -t <tile_list> [ -split n ]
The sitename option determines where a run will take place. Currently, valid options are "sandyb", "westmere", and "local". The sandyb and westmere sites are for use on the Midway cluster at the University of Chicago. The "local" site assumes a 12 core machine. This can be tweaked by editing conf/swift.properties.
The params file defines the path to inputs, outputs, the type of model to run, and what post processing steps need to happen.
The campaign option defines a directory that contains input file specific to a campaign.
The gridlist is a set of latitude and longitude indexes that should be processed.
The -split option may be used to break up the simulation in smaller chunks. For example, a split of 2 will run a single tile across four different nodes. This can be useful for very dense tiles.
The parameter file is a YAML-formatted file containing all the parameters of a psims run. It defines things like the number of simulation years, the path to climate input files, and which model to use. Below is a list of parameters and a description of what it does.
Parameter | Description |
---|---|
aggregator | Aggregator options, used to average a variable across a region |
checker | Checker translator and options, check if a tile should be simulated or not |
delta | Simulation delta, gridcell spacing in arcminutes |
executable | Name of executable and arguments to run for each grid |
lat_zero | Top edge of the North most grid cell in the campaign |
lon_zero | Left edge of the West most grid cell in the campaign |
long_names | Long names for variables, in same order that variables are listed |
model | Defines the type of model to run. Valid options are dssat45, dssat46, apsim75 |
num_lats | Number of latitudes to be included in final nc4 file (starting with lat_zero) |
num_lons | Number of longitudes to be included in final nc4 file (starting with lon_zero) |
num_years | Number of years to simulate |
out_file | Defines the prefix of the final nc4 filename |
outtypes | File extensions of files to include in output tar file |
refdata | Directory containing reference data. Will be copied to each simulation |
ref_year | Reference year (the first year of the simulation) |
scens | Number of scenarios in the campaign |
soils | Directory containing soils |
tappcmp | Campaign translator and options |
tappinp | Input translator and options, goes from experiment.json and soil.json to model specific files |
tapptilewth | Weather tile translator and options |
tapptilesoil | Soil tile translator and options |
tappnooutput | The "no output" translator and options, typically used to create empty data |
tappwth | Weather translator and options, converts .psims.nc format into model specfic weather files |
tdelta | Tile delta gridcell spacing in arcminutes |
postprocess | Name of translator and options to run after running executable |
var_units | Units to use for each variable, in the same order that variables are listed |
variables | Define the variables to extract to final outputs |
weather | Defines the directory where weather data is stored |
When pysims is run, the user must specify a campaign directory with the --campaign parameter. Typically this campaign directory contains two relevant files named Campaign.nc4 and exp_template.json. These files are used by the jsons2dssat and jsons2apsim translators to create experiment files for the crop model.
The exp_template.json file contains key-value pairs for data that will be written to the experiment file. These values represent things like fertilizer amounts, irrigation settings, and planting dates. Static settings for the experiment are stored in exp_template.json. Values that vary by lat, lon, scenario, or time get stored in Campaign.nc4.
Here is an example of irrigation definitions in exp_template.json.
"dssat_simulation_control": {
"data": [
"irrigation": {
"ithru": "100",
"iroff": "GS000",
"imeth": "IR001",
"imdep": "40",
"ireff": "1.0",
"iramt": "10",
"ithrl": "80"
},...
But users may not want to these irrigation settings everywhere. If they have a collection of irrigation amounts (iramt) that change by location, users may create a variable in Campaign.nc4 called iramt. The most basic version of this would be a NetCDF variable in the format of float iramt(lat, lon). When pysims runs for a given point, the appropriate value would transfer from Campaign.nc4 into the experiment file. If iramt is not defined in Campaign.nc4, the static value from exp_template.json is used instead.
There may be situations where users want to have multiple irrigation amounts defined in your exp_template.json. In this case having an iramt variable in Campaign.nc4 variable is ambiguous because you're not sure which irrigation amount it corresponds to. In these cases pysims uses a numbering system in the Campaign.nc4 variable names. The variable iramt_1 corresponds to the first instance of iramt in exp_template.json. iramt_2 corresponds to the second instance, and so on. This process works the same for all variables, not just limited to iramt.
The aggregation script is responsible for taking the final output of a psims simulation and computing the average value for a variable across some geographic region. To enable aggregation, add a section named 'aggregator' to your parameters file with the following parameters:
Parameter | Description |
---|---|
aggfile | Location of an aggfile. The aggfile contains information about geographic boundries at given lats/lons. Common uses here are gadm regions and food producing units. |
weightfile | Location of the weightfile, used to give certain geographic areas more weight than others |
levels | Comma separated list of levels from the aggfile (example: gadm0, gadm1, gadm2) |
The aggfile and weightfile must match the resolution used in your simulation. To generate a new aggfile you can use the gdal_rasterize utility to convert from a gadm shapefile to a netcdf file, then use bin/create_agg_limits.py to add the required variables and dimensions.
Example parameters:
aggregator:
aggfile: /path/to/agg.nc
weightfile: /path/to/weight.nc
levels: gadm0
We have made two full global datasets available to pSIMS users:
AgMERRA Climate Forcing Dataset for Agricultural Modeling
Harmonized World Soil Database
Due to the size of these datasets, they are available only via Globus online. If you do not already have a Globus account, you may create one at globus.org. The endpoint name is davidk#psims. Harmonized World Soil Database files are available in the /soils/hwsd200.wrld.30min directory. AgMERRA climate data is available in the /clim/ggcmi/agmerra directory.
A tilelist file contains a list of latitudes and longitudes indexes to be processed, in the format of "latidx/lonidx". Here is an example:
0024/0044
0024/0045
The output/ directory contains a directory for each latitude being processed. Within each latitude directory, a tar.gz file exists for each longitude. For example, if your gridList contained a grid 100/546, you would see an output file called runNNN/output/100/546output.tar.gz. This file is generated from within the Swift work directory. Which files get included in the file is determined by how you set "outtypes" in your parameter file.
The parts/ directory contains the output NetCDF files for each grid being processed. When grid 0024/0044 is done processing, you will see a file called runNNN/parts/0024/546.psims.nc.
The combined nc file is saved in the runNNN directory. Its name depends on the value of "out_file" in your params file. If you set out_file to "out.psims.apsim75.cfsr.whea", the final combined nc file would be called "out.psims.apsim75.cfsr.whea.nc4".
There may be times when a psims run fails. Failures may be caused by problems with the data, the hardware, or with any of the intermediate programs involved. From within the runNNN directory, you may run any of the following scripts
$ ./resume.parts.sh # Continue part generation from where a failed run has stopped
$ ./rerun.parts.sh # Rerun all part generation tasks
$ ./resume.combinelon.sh # Continue combinelon from where a failed run has stopped
$ ./rerun.combinelon.sh # Rerun all combinelon tasks
$ ./resume.combinelat.sh # Continue combinelat from where a failed run has stopped
$ ./rerun.combinelat.sh # Rerun all combinelat tasks
$ ./resume.aggregate.sh # Continue aggregation from where a failed run has stopped
$ ./rerun.aggregate.sh # Rerun all aggregation tasks