16S-pipeline

usearch based 16S community profiling pipeline for analysis of ribosomal amplicon sequencing & analysis

You will need the following tools to use this pipeline:

usearch7 from Rob Edgar's Drive5 site as described here:
Edgar, R.C. (2013) UPARSE: Highly accurate OTU sequences from microbial amplicon reads, Nature Methods Pubmed:23955772, dx.doi.org/10.1038/nmeth.2604
The naive Bayes RDP classifier from the RDP Project: on github as described here:
Wang, Q, G. M. Garrity, J. M. Tiedje, and J. R. Cole. 2007. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol. 73(16):5261-7.
A working MySQL server installation (contact me for a SQLite3 version)

Edit the file globals with the paths to the above files and the MySQL host and database name; Leave the TRUNCLEN unchanged for now.
Look at the 0.setup script to make sure the paths to the data are correct and adjust as necessary to find the .fasta, .qual, and mapping.txt files.
Use the EXECUTE command to run the pipeline and review the results. In particular, pay attention to the data in the 1.quality_filter.stats.log file. Use the rules described on Rob Edgar's site, decide on the TRUNCLEN and possibly the MAXEE parameters.
The 1.quality_filter.stats.log file contains data on the % of reads falling into the read length bins and what % of reads are accounted for buy a bin and cummlatively. A choice needs to be made between the accumulated % of reads and the avgEE (cumulative error rate average).
Rerun the EXECUTE command and examine the output.