/paper-yadkin-swat-svi-study

Data repository for paper titled: "Applying Climate Change Risk Management Tools to Combine Streamflow Projections and Social Vulnerability".

Primary LanguageR

paper-yadkin-swat-svi-study

Data repository for paper titled: "Applying Climate Change Risk Management Tools to Combine Streamflow Projections and Social Vulnerability".

DOI

This README.md file was generated on 20190410 by Sheila Saia.

This GitHub repository was created to provide access to collected data, analysis code, and other information associated with the paper by Saia et al. titled "Applying Climate Change Risk Management Tools to Combine Streamflow Projections and Social Vulnerability" in Ecosystems (Ecosystems link: https://link.springer.com/article/10.1007/s10021-019-00387-5, TreeSearch link: https://www.fs.usda.gov/treesearch/pubs/56780).

General Information

Title of Dataset
"paper-yadkin-swat-svi-study"

Dataset & Repo Contact Information
Name: Sheila Saia
Institution: United States Department of Agriculture Forest Service, Southern Research Station
Address: 3041 Cornwallis Road, Durham, NC 27709
Email: ssaia at ncsu dot edu

Date of data collection
Soil and Water Assessment Tool (SWAT) model outputs were generated in 2016 and are available at on GitHub at https://github.com/sheilasaia/paper-yadkin-swat-study. United States Forest Service (USFS) land use predictions were generated in 2015 and are also available at the previously mentioned GitHub link. Social vulnerability index (SVI) results (2010-2014) were downloaded from the Centers for Disease Control Agency for Toxic Substance and Disease Registry (ATSDR) data download website in 2018. All other data originated from publically available sites as described in the paper associated with this dataset.

Geographic location of data collection
All data is associated with the Yadkin-Pee Dee River Watershed (YPD) in North Carolina, USA.

Information about funding sources that supported the collection of the data
Sheila Saia was supported by funding through the Oak Ridge Institute for Science and Education (ORISE).

Sharing & Access Information

Licenses/restrictions placed on the data
Please use and distribute according to CC-BY v4.0. For a human readible version of this license visit https://creativecommons.org/licenses/by/4.0/.

Links to publications that cite or use the data
SWAT simulated streamflow data was also used by Suttles et al. (2018).

Links to other publicly accessible locations of the data
This dataset and associated R code are available at https://github.com/sheilasaia/paper-yadkin-swat-svi-study and via Zenodo. The associated publication is available via Ecosystems and via TreeSearch.

Links/relationships to ancillary data sets
All links to publically available data are described here, in Saia et al. (2019), and Suttles et al. (2018).

Data derived from another source
All links to publically available data are described here, in Saia et al. (2019), and in Suttles et al. (2018).

Additional related data collected that was not included in the current data package
This directory does not include all environmental data required to run and calibrate the SWAT model developed by Suttles et al. (2018). For this information, visit the GitHub repository associated with Suttles et al. (2018): https://github.com/sheilasaia/paper-yadkin-swat-study.

Are there multiple versions of the dataset?
All publically available data is described here, in Saia et al. (2019), and in Suttles et al. (2018). With respect to simulated data and data analysis scripts, there are no other versions available online.

Recommended citation for the data
Saia, S.M., K.M. Suttles, B.B. Cutts, R.E. Emanuel, K.L. Martin, D.N. Wear, J.W. Coulston, J.M. Vose. 2019. Applying Climate Change Risk Management Tools to Integrate Streamflow Projections and Social Vulnerability. Ecosystems. 23:67-83. Ecosystems link TreeSearch link

Paper Availability
The paper is available online at via Ecosystems and TreeSearch. If you do not have a subscription to the journal or are having trouble accessing it, please contact Sheila Saia directly for a copy of the pre-print.

Data & File Overview

This repository is organized into two main directories: (1) swat_svi_python_analysis and (2) swat_svi_r_analysis.

1. swat_svi_python_analysis directory

The swat_svi_python_analysis directory contains two sub directories: data and scripts. The data directory includes spatial data and a scratch directory needed for storing temporary outputs. The scripts directory includes Python scripts (.py files) for calculating percent land cover for each subbasin and scaling SVI data from the census tract to subbasin scale.

1.1 data

Directory name: data
Short description: This directory contains the spatial data directory.

1.1.2 spatial

Directory names: spatial
Short description: This directory contains spatial data and the scratch directory.

File List
Filename: yadkin_subs_albers.shp
Short description: This shape file contains the 28 YPD subbasin boundaries as delineated by SWAT. This file was generated by SWAT and then projected to the United States of America Contiguous Albers Equal Area Conic USGS projection to ensure more accurate area calculations later on.

Filename: yadtracts_30m
Short description: This raster file contains the boundaries of census tracts within the YPD watershed at a 30 x 30m resolution.

Filename: yadlu_1992.tif
Short description: This raster file contains the 1992 National Land Cover Dataset (NLCD) land use classes for the YPD watershed at a 30 x 30m resolution.

Filename: yadlurec_1992
Short description: This raster file was created using by reclassifying (i.e., combining) land use categories from yadlu_1992.tif so they could be compared to the 2060 data. We used the ESRI ArcGIS raster calculator and exported the summary table to caculate watershed wide land use classes and save these data to yadkin_lu_baseline_reclass_1992.txt (see 2.5.2).

Filename: yadluA_2060.tif
Short description: This raster file represents the 2060 land use under the MIROC 8.5 scenario as described in Saia et al. (2019) and Suttles et al. (2018). These data came from the USFS Southern Research Station's Forest Futures report as described in further detail in Saia et al. (2019) and Suttles et al. (2018). We used the ESRI ArcGIS raster calculator and exported the summary table to caculate watershed wide land use classes and save this data to yadkin_lu_miroc8_5_2060.txt (see 2.5.2).

Filename: yadluB_2060.tif
Short description: This raster file represents the 2060 land use under the CSIRO 8.5 scenario as described in Saia et al. (2019) and Suttles et al. (2018). These data came from the USFS Southern Research Station's Forest Futures report as described in further detail in Saia et al. (2019) and Suttles et al. (2018). We used the ESRI ArcGIS raster calculator and exported the summary table to caculate watershed wide land use classes and save this data to yadkin_lu_csiro8_5_2060.txt (see 2.5.2).

Filename: yadluC_2060.tif
Short description: This raster file represents the 2060 land use under the CSIRO 4.5 scenario as described in Saia et al. (2019) and Suttles et al. (2018). These data came from the USFS Southern Research Station's Forest Futures report as described in further detail in Saia et al. (2019) and Suttles et al. (2018). We used the ESRI ArcGIS raster calculator and exported the summary table to caculate watershed wide land use classes and save this data to yadkin_lu_csiro4_5_2060.txt (see 2.5.2).

Filename: yadluD_2060.tif
Short description: This raster file represents the 2060 land use under the Hadley 4.5 scenario as described in Saia et al. (2019) and Suttles et al. (2018). These data came from the USFS Southern Research Station's Forest Futures report as described in further detail in Saia et al. (2019) and Suttles et al. (2018). We used the ESRI ArcGIS raster calculator and exported the summary table to caculate watershed wide land use classes and save this data to yadkin_lu_hadley4_5_2060.txt (see 2.5.2).

Filename: scratch directory
Short description: The scrach directory is an intentionally empty directory that is used to hold temporary (intermediate) files generated by the .py scrips in the scripts directory (see 1.2).

Relationship Between Files
These files were used to compute percentage of land use in the baseline (1982-2002) and projected (2050-2070) periods. The Python files used to do these calculations are explained in further detail in 1.2 and tabular data (.txt files) are described in further detail in 2.5.1. These results were then compared to assess land use change in the YPD.

Raw Data
The directory does not contain raw data but data sources are explained in the README.md file contained within the spatial directory.

1.2 scripts

Directory name: scripts
Short description: This directory contains six Python scripts that were used to automate land use percentage calculations and SVI scaling for each subbasin in the YPD.

File List
Filename: lu_baseline_1992_area_calcs.py
Short description: This Python script automates the percent area calculation for various 1992 NLCD land use types within each of the 28 subbasins. This script generates the lu_baseline_1992_allsubs.csv file (see 2.5.2).

Filename: lu_miroc8_5_2060_area_calcs.py
Short description: This Python script automates the percent area calculation for various 2060 MIROC 8.5 scenario land use types within each of the 28 subbasins. This script generates the lu_miroc8_5_2060_allsubs.csv file (see 2.5.2).

Filename: lu_csiro8_5_2060_area_calcs.py
Short description: This Python script automates the percent area calculation for various 2060 CSIRO 8.5 scenario land use types within each of the 28 subbasins. This script generates the lu_csiro8_5_2060_allsubs.csv file (see 2.5.2).

Filename: lu_csiro4_5_2060_area_calcs.py
Short description: This Python script automates the percent area calculation for various 2060 CSIRO 4.5 scenario land use types within each of the 28 subbasins. This script generates the lu_csiro4_5_2060_allsubs.csv file (see 2.5.2).

Filename: lu_hadley4_5_2060_area_calcs.py
Short description: This Python script automates the percent area calculation for various 2060 Hadley 4.5 scenario land use types within each of the 28 subbasins. This script generates the lu_hadley4_5_2060_allsubs.csv file (see 2.5.2).

Filename: svibd_2014_scaling_calcs.py
Short description: This Python script automates SVI scaling calculations for each of the 28 YPD subbasins. These proportions (saved in svibd_2014_scaling_allsubs.csv, see 2.5.2) are needed to convert census tract SVI result to the subbasin scale.

Relationship Between Files
The lu*.py scripts are used to calculate the percent area of a given land use type for each of the 28 subbasins in the YPD and results are exported to the tabular data director for R analysis (see 2.5.2). The svibd_2014_scaling_calcs.py script calculates the percentage of subbasin area that each census tract takes up for scaling purposes (see 2.3) and generates svibd_2014_scaling_allsubs.csv to do so (see 2.5.2). The scratch directory is intentionally empty but holds temporary files when the .py scrits in the scripts directory are executed.

Raw Data
The directory does not contain raw data but does rely on raw data provided by publically available sources as explained in the README.md file contained within the spatial directory.

2. swat_svi_r_analysis directory

The swat_svi_r_analysis directory contains R scripts as well as three subdirectories: data, figures, and functions.

2.1 hiflow_analysis.R

Filename: hiflow_analysis.R
Short description: This R script reformats SWAT output.rch files and uses them to calculate the percent change in the number of 10yr and outlier streamflow events as described by Saia et al. (2019).

Relationship Between Files
This R script requires several R fucntions (see 2.6), raw SWAT outputs (output.rch files, see 2.5.2) and the YPD subbasin boundary shape file yadkin_subs_utm17N.shp (see 2.5.1). This R script generates tabular data (i.e., hiflow_10yr_change_calcs.csv and hiflow_outlier_change_calcs.csv described in 2.5.2) and figures presented in Saia et al. (2019).

Raw Data
The file does not contain raw data but relies on raw data stored in the tabular directory (see 2.5.2).

2.2 svi_reformatting.R

Filename: svi_reformatting.R
Short description: This R script reformats raw SVI (2010-2014 period) for the US and generates the us_svi_2014_albers_reformatted.csv file (see 2.5.2).

Relationship Between Files
This R script generates the us_svi_2014_albers_reformatted.csv file (see 2.5.2), which is required to run the svi_analysis.R script (see 2.3).

Raw Data
The directory does not contain raw data but relies on raw data obtained from publically available datasets as described in the spatial directory README file (see 2.5.1).

2.3 svi_analysis.R

Filename: svi_analysis.R
Short description: This R script analyzes SVI data for the YPD, combines SVI data with SWAT outputs as described by Saia et al. (2019), and generates figures for this publication.

Relationship Between Files
This R script requires several R fucntions (see 2.6), svibd_2014_scaling_allsubs.csv (see 1.2 and 2.5.2), us_svi_2014_albers_reformatted.csv (see 2.5.2), and several shape files (see 2.5.1). It generates data and figures as presented in Saia et al. (2019). Figures are exported to the figures directory (see 2.7).

Raw Data
The directory does not contain raw data.

2.4 landuse_analysis.R

Filename: landuse_analysis.R
Short description: This R script caculates percent change in land use between the baseline and projected periods for each subbasin in the YPD as presented in Saia et al. (2019).

Relationship Between Files
This R script requires several R fucntions (see 2.6), outputs from Python land use script analysis (see 1.2 and 2.5.2), raw SWAT outputs for baseline conditions (true_baseline_output.rch, see 2.5.1), and the yadkin_subs_utm17N.shp shape file (see 2.5.1). This R script generates data and figures that are presented in Saia et al. (2019). Figures are exported to the figures directory (see 2.7).

Raw Data
The directory does not contain raw data.

2.5 data

Directory name: data
Short description: This directory contains the spatial and tabular data directories. These data are required for R data analysis.

2.5.1 spatial

Directory names: spatial
Short description: This directory contains spatial data.

File List
Filename: yadkin_subs_utm17N.shp
Short description: This shape files includes each of the 28 YPD subbasin boundaries. For further details on where this information was obtained, see the README file in the spatial directory.

Filename: yadkin_svi2014_utm17N.shp
Short description: This shape files includes the census tract SVI data clipped to the YPD boundary. For further details on where this information was obtained, see the README file in the spatial directory.

Filename: yadkin_counties_svi2014_utm17N.shp
Short description: This shape files includes the census tract SVI data clipped to the county boundaries that touch the YPD boundary. For further details on where this information was obtained, see the README file in the spatial directory.

Filename: yadkin_majortribs_utm17N.shp
Short description: This shape files includes the major rivers of the YPD. For further details on where this information was obtained, see the README file in the spatial directory.

Filename:yadkin_unclip_counties_utm17N.shp Short description: This shape files includes the county boundaries overlapping the YPD. For further details on where this information was obtained, see the README file in the spatial directory.

Relationship Between Files
These files are required to run some of the R scripts (see 2.1-2.4). For further details on where this information was obtained, see the README file in the spatial directory.

Raw Data
The directory does not contain raw data.

2.5.2 tabular

Directory names: tabular
Short description: This directory contains tabular data.

File List
Filename: *output.rch files
Short description: These output.rch files were generated by SWAT as explained by Suttles et al. (2018) and Saia et al. (2019). The tabular data directory includes SWAT output.rch files the baseline (true_baseline_output.rch, miroc_backcast_baseline_output.rch, csiro_backcast_baseline_output.rch, hadley_backcast_baseline_output.rch) and each of the four projection (miroc8_5_projected_output.rch, csiro8_5_projected_output.rch, csiro4_5_projected_output.rch, hadley4_5_projected_output.rch).

Filename: yadkin_lu_*.txt files
Short description: These files (yadkin_lu_baseline_reclass_1992.txt, yadkin_lu_miroc8_5_2060.txt, yadkin_lu_csiro8_5_2060.txt, yadkin_lu_csiro4_5_2060.txt, yadkin_lu_hadley4_5_2060.txt) were obtained by exporting the summary tables from baseline and projected land use data (see 1.1.2).

Filename: lu_*_allsubs.csv files
Short description: These files (lu_baseline_1992_allsubs.csv, lu_miroc8_5_2060_allsubs.csv, lu_csiro8_5_2060_allsubs.csv, lu_csiro4_5_2060_allsubs.csv, lu_hadley4_5_2060_allsubs.csv) were generated by Python scripts (see 1.2) and are used in the landuse_analysis.R script (see 2.4).

Filename: kn_table_appendix4_usgsbulletin17b.csv
Short description: This file was created by converting the kn table in Appendix 4 from the USGS Bulletin 17b as referenced in Saia et al. (2019).

Filename: us_svi_2014_albers.txt
Short description: This file was derived spatial SVI data obtained on the the ATSDR data download website for all census tracts in the United States but does not include spatial data (i.e., it was obtained by exporting the attribute data in ESRI ArcGIS). For further details on where this information was obtained, see the README file in the spatial directory (see 2.5.1). It was projected to United States of America Contiguous Albers Equal Area Conic USGS projection using ESRI ArcGIS; we did not check the 'preserve shape' box.

Filename: us_svi_2014_albers_reformatted.txt
Short description: This file is the reformatted version of us_svi_2014_albers.txt. Reformatting was done using the svi_reformatted.R script (see 2.2).

Filename: svibd_2014_scaling_allsubs.csv
Short description: This file includes the scaling factors to convert cenus tract SVI data to the subbasin scale. It was generated using the svibd_2014_scaling_calcs.py script described in 1.2.

Filename: hiflow_10yr_change_calcs.csv
Short description: This file includes the percent change in number of 10yr flows as calculated using the hiflow_analysis.R script (see 2.1).

Filename: hiflow_outlier_change_calcs.csv
Short description: This file includes the percent change in number of outlier flows as calculated using the hiflow_analysis.R script (see 2.1).

Relationship Between Files
These tabular files are required to run the various R scripts described in 2.1-2.4.

Raw Data
The directory does not contain raw data.

2.6 functions

Directory name: functions
Short description: This directory contains the home-made R functions needed to run the .R scripts (see 2.1-2.4).

File List
Filename: count_hiflow_outliers_using_baseline.R
Short description: This R function finds outliers in the SWAT baseline output.rch files for high flow risk analysis.

Filename: count_hiflow_outliers.R
Short description: This R function finds outliers in projected SWAT output.rch file for high flow risk analysis.

Filename: flow_change.R
Short description: This R function calculates the percent change in the number of flows between baseline and projection datasets for a given return period.

Filename: logperson3_factor_calc.R
Short description: This R function calculates log-Pearson type III frequency factor (kt) for high flow frequency analysis of streamflow data (i.e., output.rch files).

Filename: model_freq_calcs_one_rch.R
Short description: This R function generates log-Pearson type III model curves for high flow frequency analysis of one subbasin.

Filename: model_freq_calcs_all_rchs.R
Short description: This R function generates log-Pearson type III model curves for high flow frequency analysis of all subbasins.

Filename: multiplot.R
Short description: This R function enables plotting of multiple ggplot objects in one layout.

Filename: obs_freq_calcs_one_rch.R
Short description: This R function selects observations for high flow frequency analysis of one subbasin.

Filename: obs_freq_calcs_all_rchs.R
Short description: This R function selects observations for high flow frequency analysis of all subbasins.

Filename: outlier_change.R
Short description: This R function calculates the percent change in number of minor and major outliers between baseline and projection datasets.

Filename: reformat_rch_file.R
Short description: This R function prepares (reformats) SWAT output.rch files for high flow frequency and high outlier flow analysis.

Filename: remove_outliers.R
Short description: This R function identifies and removes statistically significant high outliers and then gives new data frame without them.

Filename: rp_n_flow_change.R
Short description: This R function determines the percent change in number of flows greater than or equal to a specified return period between the baseline and projection datasets.

Relationship Between Files
These functions are required to run the .R scripts (see 2.1-2.4).

Raw Data
The directory does not contain raw data.

2.7 figures

Directory name: figures
Short description: This directory is left intentionally empty to store figure outputs from the .R scripts (see 2.1-2.4).

Relationship Between Files
There are intentionally no files in this directory.

Raw Data
The directory does not contain raw data.

Methodological Information

Description of methods used for collection/generation of data:
See the associated Ecosystems journal article for a full description of the methods used to collect and analyze these data.

Methods for processing the data:
See the R and Python scripts in this repository as well as the associated Ecosystems journal article for a full description of the methods used to collect and analyze these data.

Instrument- or software-specific information needed to interpret the data:
R (open-source, version 3.4.3, https://www.r-project.org/) is needed to run .R files, Python (open-source, version 2.7, https://www.python.org/) is needed to run .R files, and an ESRI ArcGIS (license required, version 10.4.1, http://desktop.arcgis.com/en/) license is required to run Python scripts that use the arcpy library. Land use data (.tif) and shape files (.shp) can be opened using ESRI ArcGIS or QGIS (open-source, version 2.18 https://qgis.org/en/site/. R, Python, or an all purpose text editor can be used to run .csv and .txt files.

Standards and calibration information, if appropriate:
See Suttles et al. (2018) for destails on SWAT model calibration.

Environmental/experimental conditions:
See the associated Ecosystems journal article and Suttles et al. (2018) for a full description of observed and modeled data used in this study.

Describe any quality-assurance procedures performed on the data:
SWAT simulations were calibrated and validated. This is described in further detail in Suttles et al. (2018). When possible, data analysis was automated in R and Python to ensure consistency.

People involved with sample collection, processing, analysis and/or submission:
See the associated Ecosystems journal article for a full description of author contributions and acknowledgments.

Data-Specific Information For Tabular Data

Variable list
Variables descirptions for *output.rch files are included in the GitHub repository associated with Suttles et al. (2018) found here: https://github.com/sheilasaia/paper-yadkin-swat-study and in README files within the associated tabular data directory. SVI, land use, and SWAT model variable listings are described in the associated tabular data directory README file (see swat_svi_r_analysis > data > tabular). For further descriptions of SVI data variables see the ATSDR data download website. For further description of NLCD data variables see the NLCD website. For further description of SWAT output.rch file variables see the [SWAT Documentation] (https://swat.tamu.edu/media/69395/ch32_output.pdf).

Missing data codes
'NA' indicates missing data unless otherwise noted.