OTU-clustering

Scripts for benchmarking and comparison of short read OTU clustering tools available via QIIME 1.9.0. The full benchmark can be launched with the script OTU-clustering/shell_scripts/launch_benchmark.sh. This script will execute all software tools and perform analysis using the datasets & scripts below. The user must modify the working directory path in launch_benchmark.sh prior to executing this script. Only commands for launching software in OTU-clustering/shell_scripts/commands_16S.sh and OTU-clustering/shell_scripts/commands_18S.sh are called using the qsub environment, although this is easily modifiable to run on any system.

Dependencies

QIIME 1.9

Simulate reads:

Benchmark:

BLAST 2.2.29+
USEARCH Uchime (7.0.1090)
UPARSE (7.0.1090)

Install

git clone https://github.com/ekopylova/OTU-clustering.git

Datasets

Read datasets: ftp://ftp.microbio.me/pub/supplemental_otu_clustering_datasets.tar.gz
Greengenes 13.8: ftp://ftp.greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz
SILVA 111: ftp://ftp.microbio.me/pub/QIIME_nonstandard_referencedb/Silva_111.tgz
16S PyNAST template: http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed
18S PyNAST template: ftp://ftp.microbio.me/pub/core_Silva119_alignment.fna.gz
For chimera checking 16S, the gold database: http://drive5.com/uchime/uchime_download.html
For chimera checking 18S, the SILVA 97% representative set from SILVA 111 (see 3)

Scripts

The benchmarking and analysis comparison can be executed using the following scripts (in given order). Scripts 2-13 require input arguments, all of which are defined in launch_benchmark.sh.

Launch full benchmark (executes all scripts below):
OTU-clustering/shell_scripts/launch_benchmark.sh

Otherwise, the user may launch each script individually,

Simulate even and staggered community reads:
OTU-clustering/shell_scripts/simulate_reads.sh
Launch all software (via QIIME’s pick_closed_reference_otus.py, pick_de_novo_otus.py and pick_open_reference_otus.py) on 16S datasets:
OTU-clustering/shell_scripts/commands_16S.sh
Launch all software on 18S datasets:
OTU-clustering/shell_scripts/commands_18S.sh
Remove singleton OTUs (OTUs consisting of only 1 read) from the final OTU tables generated in steps 2 and 3:
OTU-clustering/python_scripts/run_filter_singleton_otus.py
Summarize taxonomy using filtered OTU tables:
OTU-clustering/python_scripts/run_summarize_taxa.py
Summarize filtered OTU tables:
OTU-clustering/python_scripts/run_summarize_tables.py
Compute true positive, false positive, false negative, precision, recall, F-measure and FP-chimera, FP-known, FP-other metrics using the summarized taxonomy results:
OTU-clustering/python_scripts/run_compute_precision_recall.py
Generate alpha diversity plots:
OTU-clustering/python_scripts/run_single_rarefaction_and_plot.py
Generate beta diversity plots:
OTU-clustering/python_scripts/run_beta_diversity_and_procrustes.py
Generate taxonomy comparison tables:
OTU-clustering/python_scripts/run_compare_taxa_summaries.py
Generate taxonomy stacked bar plots:
OTU-clustering/python_scripts/run_generate_taxa_barcharts.py
Plot TP, FP-chimera, FP-known and FP-other results:
OTU-clustering/python_scripts/plot_tp_fp_distribution.py

Citing