/intertidal-eDNA

Data and R code for "Environmental DNA (eDNA) metabarcoding differentiates between micro-habitats within the rocky intertidal" (Shea & Boehm)

Primary LanguageHTML

Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal

Meghan M. Shea & Alexandria B. Boehm

DOI

This GitHub repository contains data and R code for reproducing Shea & Boehm (2024), "Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal". If you reference this project or any results, please cite this paper (currently available as a pre-print above). If you reuse data or code from this repository, please additionally cite our Zenodo archive.

Abstract: While the utility of environmental DNA (eDNA) metabarcoding surveys for biodiversity monitoring continues to be demonstrated, the spatial and temporal variability of eDNA, and thus the limits of the differentiability of an eDNA signal, remains under-characterized. In this study, we collected eDNA samples from distinct micro-habitats (~40 m apart) in a rocky intertidal ecosystem over their exposure period in a tidal cycle. During this period, the micro-habitats transitioned from being interconnected, to physically isolated, to interconnected again. Using a well-established eukaryotic (cytochrome oxidase subunit I) metabarcoding assay, we detected 415 species across 28 phyla. Across a variety of univariate and multivariate analyses, using exclusively taxonomically-assigned data as well as all detected amplicon sequence variants (ASVs), we identified unique eDNA signals from the different micro-habitats sampled. This differentiability paralleled expected ecological gradients and increased as the sites became more physically disconnected. Our results demonstrate that eDNA biomonitoring can differentiate micro-habitats in the rocky intertidal only 40 m apart, that these differences reflect known ecology in the area, and that physical connectivity informs the degree of differentiation possible. These findings showcase the potential power of eDNA biomonitoring to increase the spatial and temporal resolution of marine biodiversity data, aiding research, conservation, and management efforts.

Where did the data used here come from?

This analysis primarily relies on the output of the Anacapa Toolkit, a pipeline for processing eDNA metabarcoding sequence data developed by Curd et al. You can read more about the Anacapa Toolkit here. We modified a containerized (Singularity) version of the Anacapa Toolkit to run on the Stanford Sherlock computing cluster; our modified version is archived on Zenodo. The raw data (FASTQ sequencing files), modified container, and scripts needed to reproduce the output analyzed here can be found in our Dryad data repository.

This analysis also requires NOAA/NOS/CO-OPS daily tide predictions from Pillar Point Harbor on 28 January 2022, downloaded as a .txt here.

What's in this repository?

The repository contains:

  • Data: the Anacapa Output and tide data described above
  • PillarPoint.Rmd: an RMarkdown file, which reproduces the full text of the Methods & Results sections of "Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal".
  • Analysis Products: the various tables and supplemental information outputs from PillarPoint.Rmd as well as the processed eDNA datasets used in the manuscript analysis, and prepared for submission to GBIF (link forthcoming)
  • Figures: the image files generated by PillarPoint.Rmd for main text and supplemental figures
  • Intertidal eDNA.RProj: this is the R project file, which you'll use to open the project locally in RStudio the first time you do so
  • PillarPoint.html: our output from a successful knit of PillarPoint.Rmd, which may be helpful for comparing your output to (your output will overwrite it unless you change the name locally) or checking package versions
  • renv folder and renv.lock: materials needed for package management using renv (more below)

How should I use this repository?

The entire analysis can executed by running the PillarPoint.Rmd RMarkdown file, which reproduces the full text of the Methods & Results sections of "Environmental DNA metabarcoding differentiates between micro-habitats within the rocky intertidal".

Step 1: Install R and RStudio (if you haven't already)

The code was most recently tested and updated against R 4.3.1. R is free, open-source and available for download here. We also recommend downloading RStudio, an integrated development environment (IDE) for running R, available here.

Step 2: Download our GitHub repository as an RStudio project

There are many ways to download our repository locally so that you can work with it.

To ensure you're getting the permanent archived version of our repository, we recommend downloading from Zenodo. Then, in RStudio, you can navigate to File -> Open Project... and then select Intertidal eDNA.Rproj from wherever you've saved the Zenodo download on your local computer.

Alternatively, you can download the repository via GitHub, either in the RStudio IDE or in the terminal:

In the RStudio IDE: Navigate to File -> New Project... and select Version Control -> Git. The repository URL is https://github.com/meghanmshea/intertidal-eDNA.git, the Project directory name is the name of the folder you'd like to save the project in, and Create project as a subdirectory of is the path to where on your computer you'd like that folder to live.

In the terminal: Follow the tutorial here.

Step 3: Load all of the necessary packages

We use renv to snapshot the project environment. When you open the project for the first time, before you do anything else, you should call renv::restore() to install all of the packages you need.

If for some reason you're having issues with renv, you can also try loading the necessary packages manually using the following code:

options(repos = list(CRAN = "http://cran.rstudio.com/"))

packages <- c(
  "tidyverse",
  "sf",
  "patchwork",
  "stringr",
  "BiocManager",
  "remotes",
  "eulerr",
  "indicspecies",
  "cluster",
  "vegan",
  "chisq.posthoc.test",
  "taxize",
  "Polychrome",
  "pals",
  "spocc",
  "mregions",
  "rgbif",
  "mapview",
  "ggbeeswarm",
  "ggh4x",
  "chunkhooks",
  "devtools",
  "grid",
  "styler"
)

install.packages(setdiff(packages, rownames(installed.packages())))
invisible(lapply(packages, library, character.only = TRUE))

BM_packages <- c(
  "phyloseq",
  "ggtree",
  "ggtreeExtra"
)

BiocManager::install(setdiff(BM_packages, rownames(installed.packages())), force = T, type = "source")

if (!require("ampvis2", quietly = TRUE)) {
  remotes::install_github("kasperskytte/ampvis2")
}

packages <- append(packages, BM_packages)
packages <- append(packages, "ampvis2")
packages <- append(packages, "tidytree")

invisible(lapply(packages, library, character.only = TRUE))

Note: If installing packages manually, know that we have occasionally had bugs with new releases of packages (especially ggtreeExtra and related dependencies like ggtree and tidytree), so if code isn't running properly, it may be an issue with package versions. To see the package versions we used the last time our code was successfully run, you can scroll to the bottom of PillarPoint.html (our PillarPoint.Rmd output) and look at the sessionInfo() session.

Step 4: Work through 'PillarPoint.Rmd'

You can either run the code chunks in the console, or knit to an .html or .pdf to see the full output all together (this will take a long time to knit the first time you do it).

Note: All of these analyses were run on a personal computer, and should be easy to reproduce. Analyses that require repeated access to an API (e.g. the GBIF range analysis) may take a long time to run. When running R code chunks, we recommend running these analyses as RStudio jobs using job::job() from the job package (more here).

Problems

If you have any trouble running the code, or find any errors, please file an issue on this repo and we'll look into it. You can also email Meghan Shea.

License

The software code contained within this repository is made available under the MIT license. The data and figures are made available under the Creative Commons Attribution 4.0 license.

Acknowledgements

We are grateful to many researchers who have modeled good practices for reproducibility and reporting in their own GitHub repos. Thanks especially to Grant McDermott (bycatch project), Alexa Fredston (marine_heatwaves_trawl project),and Ramon Gallego (eDNA.and.Ocean.Acidification.Gallego.et.al.2020 project).