pSIMS Overview

pSIMS is a suite of tools, data, and models developed to facilitate access to high-resolution climate impact modeling. This system largely automates the labor-intensive processes of creating and running data ingest and transformation pipelines and allows researchers to use high-performance computing to run simulations that extend over large spatial extents, run for many growing seasons, or evaluate many alternative management practices or other input configurations. In so doing, pSIMS dramatically reduces the time and technical skills required to investigate global change vulnerability, impacts and potential adaptations. pSIMS is designed to support integration and high-resolution application of any site-based climate impact model that can be compiled in a Unix environment (with a focus on primary production: agriculture, livestock, and forestry).

For more information about pSIMS, please see the following paper:

Elliott, J., D. Kelly, J. Chryssanthacopoulos, M. Glotter, Kanika Jhunjhnuwala, N. Best, M. Wilde, and I. Foster, (2014). The Parallel System for Integrating Impact Models and Sectors (pSIMS). Environmental Modeling and Software: Special Issue on Agricultural systems modeling & software. Available online, May 22, 2014. http://dx.doi.org/10.1016/j.envsoft.2014.04.008

Software Dependencies

Package	Location	Type
APSIM	https://www.apsim.info	Crop model
Boost	http://www.boost.org	Required to run APSIM
CenW	http://www.kirschbaum.id.au/Welcome_Page.htm	Generic forestry model
DSSAT	http://dssat.net	Crop model
Mono	http://www.mono-project.com	Required to run APSIM
nco 4.4.3	http://nco.sourceforge.net	Required for postprocessing
netcdf4	https://www.unidata.ucar.edu/software/netcdf/	Required
netcdf4 python libraries	https://github.com/Unidata/netcdf4-python	Required
Oracle Java 7	http://www.oracle.com/us/downloads/index.html	Required
Swift 0.95	http://swift-lang.org	Required

In addition to installing these packages, there are also a number of python modules that must be installed. These are defined in pysims/requirements.txt. To install these packages in an automated way, run the command "pip install -r requirements.txt" within a Python virtual environment. For more information on Python virtual environments, please see http://docs.python-guide.org/en/latest/dev/virtualenvs.

Compiling the Models

DSSAT 4.6:

The DSSAT source code is on github.com and the repository is private. Request access from the DSSAT group for access.
git clone git@github.com:DSSAT/dssat-csm.git (or https://github.com/DSSAT/dssat-csm.git for HTTPS)
cd dssat-csm
git checkout tags/v4.6.0.49
patch -p1 < /path_to_psims/models/pdssat/dssat46.patch
make
Executable will be created as DSCSM046.EXE
Use the example params file pysims/params.dssat.sample as a guide to help you get started

APSIM 7.9:

Install mono

git clone git@github.com:mono/mono.git
cd mono
git checkout tags/mono-4.8.1.0
Configure and set installation directory: ./configure --prefix=/installation/directory
make && make install
Add bin directory to PATH, and lib directory to LD_LIBRARY_PATH

Install mono-basic

git clone git@github.com:mono/mono-basic.git
cd mono-basic
Configure and set installation directory: ./configure --prefix=/installation/directory
make && make install
Add bin directory to PATH, and lib directory to LD_LIBRARY_PATH

Install Apsim

Checkout the source: svn co http://apsrunet.apsim.info/svn/apsim/tags/Apsim79
cd Apsim79
patch -p0 < /path_to_psims/models/papsim/papsim79.patch
cd model/Build
mcs VersionStamper.cs
mono VersionStamper.exe Directory=$PWD
./MakeAll.sh
Executable will be created as Model/ApsimModel.exe
Refer to example params file pysims/params.apsim.sample

Single Tile Simulation

Simulating a single tile is useful for testing purposes. It allows you to verify that your parameters are set correctly and to verify the simulation results looks reasonable. Create a new directory and change the The command for running a single point simulation is:

Usage: pysims.py --campaign <campaign_dir> --param <param_file> --tlatidx <tile_latitude_index> --tlonidx <tile_longitude_index> [ --latidx <point_latitude_index> --lonidx <point_longitude_index> ]

If a point latidx and lonidx is specified, only a single point will be simulated rather than all points in the tile.

Multi-Tile Simulation

In most cases you'll want to simulate a group of tiles. Since this can be computationally expensive, this type of simulation will typically be done on a cluster or supercomputer. To accomplish this, pSIMS uses the Swift parallel scripting language. The "psims" script is a shell script used to start the simulations.

Usage: ./psims -s <sitename> -p <paramfile> -c <campaign> -t <tile_list> [ -split n ]

The sitename option determines where a run will take place. Currently, valid options are "sandyb", "westmere", and "local". The sandyb and westmere sites are for use on the Midway cluster at the University of Chicago. The "local" site assumes a 12 core machine. This can be tweaked by editing conf/swift.properties.

The params file defines the path to inputs, outputs, the type of model to run, and what post processing steps need to happen.

The campaign option defines a directory that contains input file specific to a campaign.

The gridlist is a set of latitude and longitude indexes that should be processed.

The -split option may be used to break up the simulation in smaller chunks. For example, a split of 2 will run a single tile across four different nodes. This can be useful for very dense tiles.

The parameter file

The parameter file is a YAML-formatted file containing all the parameters of a psims run. It defines things like the number of simulation years, the path to climate input files, and which model to use. Below is a list of parameters and a description of what it does.

Parameter	Description
aggregator	Aggregator options, used to average a variable across a region
checker	Checker translator and options, check if a tile should be simulated or not
delta	Simulation delta, gridcell spacing in arcminutes
executable	Name of executable and arguments to run for each grid
lat_zero	Top edge of the North most grid cell in the campaign
lon_zero	Left edge of the West most grid cell in the campaign
long_names	Long names for variables, in same order that variables are listed
model	Defines the type of model to run. Valid options are dssat45, dssat46, apsim75
num_lats	Number of latitudes to be included in final nc4 file (starting with lat_zero)
num_lons	Number of longitudes to be included in final nc4 file (starting with lon_zero)
num_years	Number of years to simulate
out_file	Defines the prefix of the final nc4 filename
outtypes	File extensions of files to include in output tar file
refdata	Directory containing reference data. Will be copied to each simulation
ref_year	Reference year (the first year of the simulation)
scens	Number of scenarios in the campaign
soils	Directory containing soils
tappcmp	Campaign translator and options
tappinp	Input translator and options, goes from experiment.json and soil.json to model specific files
tapptilewth	Weather tile translator and options
tapptilesoil	Soil tile translator and options
tappnooutput	The "no output" translator and options, typically used to create empty data
tappwth	Weather translator and options, converts .psims.nc format into model specfic weather files
tdelta	Tile delta gridcell spacing in arcminutes
postprocess	Name of translator and options to run after running executable
var_units	Units to use for each variable, in the same order that variables are listed
variables	Define the variables to extract to final outputs
weather	Defines the directory where weather data is stored

Campaign Files

When pysims is run, the user must specify a campaign directory with the --campaign parameter. Typically this campaign directory contains two relevant files named Campaign.nc4 and exp_template.json. These files are used by the jsons2dssat and jsons2apsim translators to create experiment files for the crop model.

The exp_template.json file contains key-value pairs for data that will be written to the experiment file. These values represent things like fertilizer amounts, irrigation settings, and planting dates. Static settings for the experiment are stored in exp_template.json. Values that vary by lat, lon, scenario, or time get stored in Campaign.nc4.

Here is an example of irrigation definitions in exp_template.json. ~~ "dssat_simulation_control": { "data": [ "irrigation": { "ithru": "100", "iroff": "GS000", "imeth": "IR001", "imdep": "40", "ireff": "1.0", "iramt": "10", "ithrl": "80" },... ~~

But users may not want to these irrigation settings everywhere. If they have a collection of irrigation amounts (iramt) that change by location, users may create a variable in Campaign.nc4 called iramt. The most basic version of this would be a NetCDF variable in the format of float iramt(lat, lon). When pysims runs for a given point, the appropriate value would transfer from Campaign.nc4 into the experiment file. If iramt is not defined in Campaign.nc4, the static value from exp_template.json is used instead.

There may be situations where users want to have multiple irrigation amounts defined in your exp_template.json. In this case having an iramt variable in Campaign.nc4 variable is ambiguous because you're not sure which irrigation amount it corresponds to. In these cases pysims uses a numbering system in the Campaign.nc4 variable names. The variable iramt_1 corresponds to the first instance of iramt in exp_template.json. iramt_2 corresponds to the second instance, and so on. This process works the same for all variables, not just limited to iramt.

Aggregation

The aggregation script is responsible for taking the final output of a psims simulation and computing the average value for a variable across some geographic region. To enable aggregation, add a section named 'aggregator' to your parameters file with the following parameters:

Parameter	Description
aggfile	Location of an aggfile. The aggfile contains information about geographic boundries at given lats/lons. Common uses here are gadm regions and food producing units.
weightfile	Location of the weightfile, used to give certain geographic areas more weight than others
levels	Comma separated list of levels from the aggfile (example: gadm0, gadm1, gadm2)

The aggfile and weightfile must match the resolution used in your simulation. To generate a new aggfile you can use the gdal_rasterize utility to convert from a gadm shapefile to a netcdf file, then use bin/create_agg_limits.py to add the required variables and dimensions.

Example parameters: ~~ aggregator: aggfile: /path/to/agg.nc weightfile: /path/to/weight.nc levels: gadm0 ~~

Obtaining Data

We have made two full global datasets available to pSIMS users:

AgMERRA Climate Forcing Dataset for Agricultural Modeling

Harmonized World Soil Database

Due to the size of these datasets, they are available only via Globus online. If you do not already have a Globus account, you may create one at globus.org. The endpoint name is davidk#psims. Harmonized World Soil Database files are available in the /soils/hwsd200.wrld.30min directory. AgMERRA climate data is available in the /clim/ggcmi/agmerra directory.

Tilelists

A tilelist file contains a list of latitudes and longitudes indexes to be processed, in the format of "latidx/lonidx". Here is an example:

0024/0044

0024/0045

Output Files

The output/ directory contains a directory for each latitude being processed. Within each latitude directory, a tar.gz file exists for each longitude. For example, if your gridList contained a grid 100/546, you would see an output file called runNNN/output/100/546output.tar.gz. This file is generated from within the Swift work directory. Which files get included in the file is determined by how you set "outtypes" in your parameter file.

The parts/ directory contains the output NetCDF files for each grid being processed. When grid 0024/0044 is done processing, you will see a file called runNNN/parts/0024/546.psims.nc.

The combined nc file is saved in the runNNN directory. Its name depends on the value of "out_file" in your params file. If you set out_file to "out.psims.apsim75.cfsr.whea", the final combined nc file would be called "out.psims.apsim75.cfsr.whea.nc4".

Rerunning and Restarting Failed Runs

There may be times when a psims run fails. Failures may be caused by problems with the data, the hardware, or with any of the intermediate programs involved. From within the runNNN directory, you may run any of the following scripts

$ ./resume.parts.sh       # Continue part generation from where a failed run has stopped
$ ./rerun.parts.sh        # Rerun all part generation tasks
$ ./resume.combinelon.sh  # Continue combinelon from where a failed run has stopped
$ ./rerun.combinelon.sh   # Rerun all combinelon tasks
$ ./resume.combinelat.sh  # Continue combinelat from where a failed run has stopped
$ ./rerun.combinelat.sh   # Rerun all combinelat tasks
$ ./resume.aggregate.sh   # Continue aggregation from where a failed run has stopped
$ ./rerun.aggregate.sh    # Rerun all aggregation tasks

para2x/psims