marta-eDNA data analysis

Deciphering community structure from Intertidal Puget Sound water samples

My project involves analyzing data published by Ryan Kelly and his team at SMEA. I will start with the raw data, and from two of the files (forward and reverse sequences), I will determine what taxa is present following their established protocols and pipeline.

My goals are:

To learn how to use all the bioinformatic platforms necessary for my project like Markdown, Github, GitBash, Jupiter, etc.
To get familiar with all the specific programs needed to clean, pair, and cluster data (PEAR, usearch, cutadapt,seqtk, blastn, and MEGAN)
Apply what I learn during the quarter to the analyze my own data when I collect my samples in December 2018

Objectives

Merge paired-end reads with R
Quality filter with R
Remove primers with R
Reverse complement appropriate sequences with seqtk
Remove sequences containing homopolymers (BSD Unix: grep; awk)
Consolidate identical sequences with usearch
Remove singletons with usearch
Cluster sequences into OTUs using usearch
BLAST clusters using blastn
Perform common ancestor grouping in MEGAN.

Timeline

Week 4: Install and get familiar with all necessary bioinformatic tools for my project
Week 5: Steps 1 & 2 in objectives
Week 6: Steps 3-6
Week 7: Steps 7 & 8
Week 8: Steps 9 & 10
Week 9: Check results and write final report on markdow document
Week 10: Project presentation

Repository Organization

Data

Raw data and other data files being generated as the analysis progresses

Analysis

All files that contain code for analysis of data

Notebook

Markdown files documenting steps of analyses and progress

Tutorial

Helpful turorials for GitHub usage

Progress

Journal entries with details of progress each day

fish546-2018/marta-eDNA