/sheds-temp-model

SHEDS Stream Temperature Model of the Northeast U.S.

Primary LanguageR

SHEDS Stream Temperature Model

Jeffrey D. Walker, PhD
Walker Environmental Research LLC

Ben Letcher, PhD
USGS, UMass

Daniel Hocking, PhD
Frostburg State University

Adapted from: Northeast Temperature Model

About

This repo contains the source code for the SHEDS Stream Temperature Model.

Quick Start

To set up and run the SHEDS Stream Temperature Model, follow these steps:

  1. Set up Configuration File
  2. Set version number in Version File
  3. Create Working Directory
  4. Run the Model
  5. Upload Results
  6. Update Documentation

Each of these steps are described in more detail in the following sections.

Configuration

Most of the scripts in this repo rely on configuration variables set within the config.sh file. Because some of these variables contain sensitive information (e.g. database passwords), the config.sh file is not tracked in git.

The configuration includes database connection parameters, and the local path to the model data directory.

However, a template (config.template.sh) is provided, with which a new config.sh can be set up.

cp config.template.sh config.sh
nano config.sh

The config.sh file must contain the following variables

# database connection
SHEDS_STM_DB_HOST=""
SHEDS_STM_DB_PORT=5432
SHEDS_STM_DB_USER=""
SHEDS_STM_DB_PASSWORD=""
SHEDS_STM_DB_DBNAME=""

# model root directory
SHEDS_STM_ROOT="/path/to/temp-model-data"

Versioning

The model versioning approach is loosely based on semantic versioning.

Each version contains three numbers of the form X.Y.Z:

  • X: Major version incremented when there is a major change to the underlying model theory or code.
  • Y: Minor version incremented when a new set of model inputs and outputs are created. This can either be due to an update of the input datasets or a (minor) change in the code.
  • Z: Patch version incremented when there is a minor change to the documentation or output files, but no change to the model calibration or prediction datasets.

The full version therefore is used to track changes to both the model and the documentation. Model calibration and results do not change better minor versions (X.Y).

The major and minor versions are set to an environment variable called SHEDS_STM_VERSION within the version.sh file. For example:

SHEDS_STM_VERSION=1.0

Unlike the configuration file, the version.sh file is tracked by git to ensure the model version coincides with model source code. Any changes to the model code should be associated with a change to the version number.

The version can be set to any string, because it is simply used to generate the model working directory. During development, for example, the version could be set to 1.0-dev.

For official model releases, the version should use two-point semantic versioning of the form X.Y where X is the major version and Y is the minor version. The minor version (Y) should be incremented when there are only changes to the input dataset and calibration. The major version (X) should be incremented when there are more significant changes to the model structure or set of predictor variables.

Important The environment variable SHEDS_STM_VERSION to the minor version only, and does not include the patch number. In other words, it is only of the form X.Y (see Model Versioning above) and does not include a v prefix.

When a new version of the model is complete, a tagged release should be created in github with the full version of the model (vX.Y.Z), and a title containing both the version and the date (e.g. vX.Y.Z (MMM DD, YYYY)).

Working Directory

In order to transfer data from one script to another, all of the scripts save and load data from a common directory. This directory is referred to as the model's working directory, but should NOT be confused with the working directory used by R (i.e. getwd()), which is the directory from which the scripts are run. In other words, the model's working directory stores the data files, while R's working directory contains the source scripts.

The path to the model's working directory is automatically generated by combining a model root path and the model version (i.e. /<root path>/<version>). The root path therefore can contain one of more sub-directories, each of which is the working directory for a specific version of the model.

The root directory should therefore look something like this:

$ tree ${SHEDS_STM_ROOT}

├── 0.9
|   ├── ...
└── 1.0
    ├── data-breakpoints.rds
    ├── data-clean.rds
    ├── data-covariates.rds
    ├── data-daymet.csv
    ├── data-db.rds
    ├── data-huc.rds
    ├── daymet-featureid_year.csv
    ├── locations-exclude.txt
    ├── locations-flowlines-distance.csv
    ├── locations-impoundment.txt
    ├── locations-tidal.txt
    ├── model-diagnostics.rds
    ├── model-input.rds
    ├── model-output.rds
    ├── model-predict-derived.csv
    ├── model-predict-derived.rds
    ├── model-predict-year.rds
    ├── temp-model.jags
    └── temp-model.log

The root path and model version are set using two environment variables:

  • SHEDS_STM_ROOT: within the config.sh file (see Configuration), this variable should be set to a local path that serves as the root directory for all model versions (e.g. /path/to/temp-model-data)
  • SHEDS_STM_VERSION: within the version.sh file (see Model Version), this variable should be set to a unique model version

The load_config() R function (defined in r/functions.R) will combine these two variables to create a complete path to the current model working directory, which is set to the wd element of the list returned from the function.

Here is an example of how these files work together:

# config.sh
SHEDS_STM_ROOT="/path/to/temp-model-data"

# version.sh
SHEDS_STM_VERSION="1.0"

# R
> source("functions.R")
> config <- load_config()
> print(config$wd)
[1] "/path/to/temp-model-data/1.0"

The config$wd path is then used throughout the various R scripts to load and save data from a single directory.

Important When creating a new model version, the user must manually create the working directory on their file system. The R scripts will NOT automatically create this directory. The load_config() will return an error if it cannot find the working directory, in which case you simply need to create it and try again.

Model Execution

The model is run by executing a series of R scripts located in the r/ directory.

These scripts comprise a chain of tasks -- each script performs one of these tasks such as fetching raw input data, merging datasets to generate a model input dataset, calibrating the model, or generating predictions.

The scripts must be run a specific order to ensure all inputs for a given script have already by generated by previous scripts. The scripts are listed in order and explained in run-model.sh.

In theory, one could execute run-model.sh to run all the R scripts sequentially after completing steps 1-3 in Quick Start.

./run-model.sh

Remember to change the model version within version.sh to create the working directory prior to running this script or you will overwrite previous results.

Or the scripts can be run individually by walking through run-model.sh line by line (this a safer approach to ensure each step is successful):

cd scripts
./locations-exclude.sh
./locations-flowlines-distance.sh
# and so on

Or by opening each script in RStudio and running them line by line in an interactive session. If using RStudio, open the r/r.proj file to load the project.

The last option is probably the most practical for new users to understand what the model scripts do exactly.

Results

The derived metrics for each catchment are exported to:

  1. database table temp_model (via r/export-db.R)
  2. CSV file r/csv/sheds-temp-model-v{VERSION}.csv (via r/export-csv.R)

After completing the model run, the output CSV file should be copied to the server within the www/static/models/temp-model/output folder.

Documentation

After running all of the model scripts, the model documentation should be updated.

The documentation is written using the bookdown package, and located within the r/docs sub-directory.

The home page for the documentation is written in the index.Rmd file. The remaining sections are each written within their own R markdown file (e.g. 01-theory.Rmd).

The documentation can be generated from the source Rmd files using the Build Book button within the Build pane in RStudio. Alternatively, the following command can be used:

rmarkdown::render_site(encoding = 'UTF-8')

The static output files (i.e. static HTML, CSS, and JavaScript) can be found within the r/docs/_book sub-directory.

After the full documentation is initially generated, individual sections can be edited and re-rendered using the Knit button in RStudio, similar to rendering individual Rmd files.

When documentation for the current model version is complete, the static output files in r/docs/_book can be copied to the appropriate location on the web server using FTP, scp or some other transfer protocol.