marta-eDNA data analysis

Deciphering community structure from Intertidal Puget Sound water samples

My project involves analyzing data published by Ryan Kelly and his team at SMEA. I will start with the raw data, and from two of the files (forward and reverse sequences), I will determine what taxa is present following their established protocols and pipeline.

My goals are:

  • To learn how to use all the bioinformatic platforms necessary for my project like Markdown, Github, GitBash, Jupiter, etc.
  • To get familiar with all the specific programs needed to clean, pair, and cluster data (PEAR, usearch, cutadapt,seqtk, blastn, and MEGAN)
  • Apply what I learn during the quarter to the analyze my own data when I collect my samples in December 2018


  1. Merge paired-end reads with R
  2. Quality filter with R
  3. Remove primers with R
  4. Reverse complement appropriate sequences with seqtk
  5. Remove sequences containing homopolymers (BSD Unix: grep; awk)
  6. Consolidate identical sequences with usearch
  7. Remove singletons with usearch
  8. Cluster sequences into OTUs using usearch
  9. BLAST clusters using blastn
  10. Perform common ancestor grouping in MEGAN.


  • Week 4: Install and get familiar with all necessary bioinformatic tools for my project
  • Week 5: Steps 1 & 2 in objectives
  • Week 6: Steps 3-6
  • Week 7: Steps 7 & 8
  • Week 8: Steps 9 & 10
  • Week 9: Check results and write final report on markdow document
  • Week 10: Project presentation

Repository Organization


Raw data and other data files being generated as the analysis progresses


All files that contain code for analysis of data


Markdown files documenting steps of analyses and progress


Helpful turorials for GitHub usage


Journal entries with details of progress each day

NOTE: other folders will be added if additional information does not fit the originally established project organization.