/psp-tek-1

Repository for the open source EET prototype developed by Tim, Emma, and Katrina

Primary LanguageR

Effectiveness and Evaluation Tool

Tim Blankemeyer, Emma Clarke, and Katrina Gertz are a group of MLIS Candidates from the University of Washington's Information School who are interested in leveraging open data and open-source tools to help solve complex problems.

Numerous environmental restoration projects have been undertaken throughout the Puget Sound, but connecting investments in these projects to co-located indicators of habitat viability is challenging. For this Open Data Literacy (ODL) Capstone project, we've collaborated with the Puget Sound Partnership, the Governor's Salmon Recovery Office, South Sound Spatial, and other partners to design a scalable data cleaning and analysis pipeline as well as an interactive visualization prototype to show what's working to restore Puget Sound.

CAVEAT: This is a prototype to show what a web-based analysis tool might look like. The underlying data for water quality and salmon were sourced from public web sites. Data and results have not been vetted or approved; that is the next step.

See the prototype at https://ejclarke.shinyapps.io/capstone/.

Contact

Any questions or comments can be directed to Leska Fore from the Puget Sound Partnership at leska.fore@psp.wa.gov.

Table of Contents

Table of contents generated with markdown-toc

Data Sources

As mentioned in the Description, the underlying data for the prototype project were sourced from public web sites.

Project investment Data

Water Quality Data

Hood Canal Summer Chum Salmon Data

Hydrologic Unit Shapefile Data

Project tools

This project was completed using open-source software tools, underpinned by the statistical programming language R. Follow the links below to learn more about the tools used, including installation instructions.

  • The R Project for Statistical Computing: R is a free software environment for statistical computing and graphics.
  • R Studio: Open source and enterprise-ready professional integrated development environment (IDE) for R.
  • Leaflet for R: Leaflet is one of the most popular open-source JavaScript libraries for interactive maps. This R package makes it easy to integrate and control Leaflet maps in R.
  • Shiny: A web application framework for R.
  • Additional R packages used:
    • tidyverse: An ecosystem of packages designed with common APIs and a shared philosophy.
    • forcats: A suite of useful tools that solve common problems with factors.
    • ggplot2: A system for declaratively creating graphics.
    • rgdal: Provides bindings for the Geospatial Data Abstraction Library.
    • spdplyr: Data manipulation verbs for the spatial data classes.
    • MazamaSpatialUtils: A suite of conversion scripts to create internally standardized spatial polygons dataframes.
    • rmapshaper: Edit and simplify 'geojson' and 'Spatial' objects.
    • stringr: Simple, consistent wrappers for common string operations

Tool usage and data pipeline

Note: Bold scripts are used in current application.

Scripts created for data cleaning and analysis pipeline:

  • Project investment data: https://github.com/katger4/psp-tek/tree/master/investments

    • hoodcanal_investments.R: Tidy and merge EAGL and PRISM project data; assign HUC-10 and HUC-12 values to all projects using addHUC function.
    • cohensD_functions.R: Functions to apply Cohen's D calculations and categorizations to chum salmon, turbidity, and TSS outcomes, aggregated by HUC.
    • combine_data.R: Connect chum salmon and water quality measures to investments at HUC-10 and HUC-12 levels, using median investment year calculation; calculate Cohen's D values for each outcome using cohensD_functions.R functions; merge all data and save as RDS for input to Shiny app.
    • all_map.R: Test code for integrating investments and outcome data into Leaflet.
    • state_hws.R: Example of integration of statewide Habitat Work Schedule (HWS) project data into pipeline, showing scalability.
    • combine_state_data.R: Example of visualization of statewide HWS investments and water quality outcomes, showing scalability.
  • Chum salmon data: https://github.com/katger4/psp-tek/tree/master/chum

    • chum_loc_and_counts.R: Loads, prepares chum salmon geospatial and measurement data for use in pipeline; assigns HUC-10 and HUC-12 values to all sites.
  • Water quality data: https://github.com/katger4/psp-tek/tree/master/water

    • water.R: Loads, prepares TSS and turbidity data for use in pipeline; assigns HUC-10 and HUC-12 values to all data.
    • addHUC.R: Function to add HUC-10 and HUC-12 values based on lat-lon; uses MazamaSpatialUtils package.
    • state_turbidity.R: Example of integration of statewide turbidity data into pipeline, showing scalability.
    • water_EDA.R: Exploratory Data Analysis on water quality measures.
    • water_maps.R: Visualizations based on initial water quality EDA.

Scripts created for shapefile processing:

  • Shapefile processing: https://github.com/katger4/psp-tek/tree/master/shapefileprep
    • hood_canal_shp.R: Preparation of Hood Canal HUC-10 and HUC-12 shapefiles for integration into Leaflet and Shiny app; uses shp2r.R function.
    • shp2r.R: Function to convert each HUC-level shapefile into a SpatialPolygonsDataFrame.
    • state_shp.R: Example of extension of shapefile conversion and plotting with Leaflet; showing scalability.

Scripts created for Shiny web-based visualization prototype:

  • Shiny web app: https://github.com/katger4/psp-tek/tree/master/shinyapp
    • app.R: Shiny app code for creating interactive web application visualizations.
    • about.html: HTML code that is integrated into Shiny app "About" tab.
    • state_tab.R: Example code to create Shiny app/tab showing statewide turbidity Cohen's D data.

Key terminology

  • Chum salmon: Oncorhynchus keta, a species of anadromous fish in the salmon family.
  • TSS, or Total Suspended Solids: Solid materials, including organic and inorganic, that are suspended in the water
  • Turbidity: The measure of relative clarity of a liquid.
  • HUC, or Hydrologic Unit Code: A unique code identifying a hydrologic unit such as a region, sub-region, watershed, or catchment. Smaller HUC units are nested inside larger HUC units. Analysis for this project, for instance, is at the HUC-10 and HUC-12 levels, with HUC-12 units nested inside HUC-10 units.

Data download variable definitions

The project's web-based visualization prototype includes functionality to download CSV files for the investment and outcome data sets used to create the visualizations. Here are definitions for all four of the downloadable data sets:

Chum Salmon

  • year: Measurement year
  • site: Measurement site and type, corresponding to label used by WA Dept. of Fish & Wildlife
  • name: Measurement site, isolated
  • project_cat: Measurement type, isolated
  • lon: Measurement site longitude
  • lat: Measurement site latitude
  • description: Measurement site description, taken from WA Dept. of Fish & Wildlife web site
  • HUC_id: Hydrologic Unit Code for measurement location
  • HUC_Name: Hydrologic Unit Name for measurement location
  • medianyr: Median investment project year for HUC containing measurement site
  • cohensd: Cohen's D value for measurement site
  • site_mean_after: Mean of measurements after medianyr
  • site_mean_before: Mean of measurements before medianyr
  • site_sd_before: Standard deviation (SD) of measurements before medianyr
  • site_sd_after: Standard deviation (SD) of measurements after medianyr
  • var_pooled: Pooled variance
  • var_cohensd: Variance of Cohen's D value for measurement site
  • sd_cohensd: Standard deviation (SD) of Cohen's D value for measurement site
  • wsubi: (intermediate change statistic calculation)
  • wsubixd: (intermediate change statistic calculation)
  • cohensd_huc_mean: Mean Cohen's D value of measurements aggregated by HUC
  • huc_mean_after: Mean of measurements after medianyr, aggregated by HUC
  • huc_mean_before: Mean of measurements before medianyr, aggregated by HUC
  • sum_wsubixd: (intermediate change statistic calculation)
  • sum_wsubi: (intermediate change statistic calculation)
  • cohensd_huc_var: Variance of Cohen's D value of measurements aggregated by HUC
  • cohensd_huc_sd: Standard deviation (SD) of Cohen's D value of measurements aggregated by HUC
  • plus_minus: (intermediate change statistic calculation)
  • TimePeriod: Categorical value placing measurement before, during, or after medianyr
  • measurement: Measurement at given site in given year
  • effectsize: Categorical label for mean Cohen's D value in given HUC
  • site_effectsize: Categorical label of Cohen's D value at given measurement site
  • status: Categorical label noting change direction of mean Cohen's D value in given HUC
  • coloreffect: HEX code color assignment for given effectsize and status
  • colorblind: HEX code color assignment (colorblind-friendly version) for given effectsize and status
  • result_type: Categorical label corresponding to model variable
  • unit: Unit of measurement, if any
  • HUC_level: Hydrologic Unit Code (HUC) level, either 10 or 12, for given row of data
  • project_source: Categorical variable corresponding to project source (PRISM or EAGL) - NA for chum salmon data

Turbidity and TSS

  • year: Measurement year
  • name: Measurement project name
  • lon: Measurement project longitude
  • lat: Measurement project latitude
  • HUC_id: Hydrologic Unit Code for measurement project location
  • HUC_Name: Hydrologic Unit Name for measurement project location
  • medianyr: Median investment project year for HUC containing measurement site
  • cohensd_huc_mean: Mean Cohen's D value of measurements aggregated by HUC
  • huc_mean_after: Mean of measurements after medianyr, aggregated by HUC
  • huc_mean_before: Mean of measurements before medianyr, aggregated by HUC
  • TimePeriod: Categorical value placing measurement before, during, or after medianyr
  • measurement: Measurement at given site in given year
  • effectsize: Categorical label for mean Cohen's D value in given HUC
  • status: Categorical label noting change direction of mean Cohen's D value in given HUC
  • coloreffect: HEX code color assignment for given effectsize and status
  • colorblind: HEX code color assignment (colorblind-friendly version) for given effectsize and status
  • result_type: Categorical label corresponding to model variable
  • unit: Unit of measurement, if any
  • HUC_level: Hydrologic Unit Code (HUC) level, either 10 or 12, for given row of data
  • Study_ID: Unique ID for measurement project
  • full_date: Date, in mm/dd/yy format, of measurement project data point
  • Location_ID: ID connected to measurement project location
  • logMeasurement: Logarithm of the value in measurement column

Investment

  • year: Investment project year
  • name: Investment project name
  • project_cat: Investment project category
  • lon: Investment project site longitude
  • lat: Investment project site latitude
  • HUC_id: Hydrologic Unit Code for investment project location
  • HUC_Name: Hydrologic Unit Name for investment project location
  • measurement: Amount of given investment project
  • result_type: Categorical value corresponding to variable
  • unit: Unit of measurement, if any
  • HUC_level: Hydrologic Unit Code (HUC) level, either 10 or 12, for given row of data
  • Study_ID: Unique ID for investment project
  • project_source: Categorical variable corresponding to investment project source (PRISM or EAGL)
  • color: HEX code color assignment corresponding to categorical investment value (small, medium, or large)

Acknowledgments

We are grateful for the support we've received from our partners: