My project involves analyzing data published by Ryan Kelly and his team at SMEA. I will start with the raw data, and from two of the files (forward and reverse sequences), I will determine what taxa is present following their established protocols and pipeline.
- To learn how to use all the bioinformatic platforms necessary for my project like Markdown, Github, GitBash, Jupiter, etc.
- To get familiar with all the specific programs needed to clean, pair, and cluster data (PEAR, usearch, cutadapt,seqtk, blastn, and MEGAN)
- Apply what I learn during the quarter to the analyze my own data when I collect my samples in December 2018
- Merge paired-end reads with R
- Quality filter with R
- Remove primers with R
- Reverse complement appropriate sequences with seqtk
- Remove sequences containing homopolymers (BSD Unix: grep; awk)
- Consolidate identical sequences with usearch
- Remove singletons with usearch
- Cluster sequences into OTUs using usearch
- BLAST clusters using blastn
- Perform common ancestor grouping in MEGAN.
- Week 4: Install and get familiar with all necessary bioinformatic tools for my project
- Week 5: Steps 1 & 2 in objectives
- Week 6: Steps 3-6
- Week 7: Steps 7 & 8
- Week 8: Steps 9 & 10
- Week 9: Check results and write final report on markdow document
- Week 10: Project presentation
Raw data and other data files being generated as the analysis progresses
All files that contain code for analysis of data
Markdown files documenting steps of analyses and progress
Helpful turorials for GitHub usage
Journal entries with details of progress each day