Scripts for benchmarking and comparison of short read OTU clustering tools available via QIIME 1.9.0.
The full benchmark can be launched with the script OTU-clustering/shell_scripts/launch_benchmark.sh
.
This script will execute all software tools and perform analysis using the datasets & scripts below.
The user must modify the working directory path in launch_benchmark.sh
prior to executing this script.
Only commands for launching software in OTU-clustering/shell_scripts/commands_16S.sh
and
OTU-clustering/shell_scripts/commands_18S.sh
are called using the qsub environment, although this is
easily modifiable to run on any system.
Simulate reads:
Benchmark:
- BLAST 2.2.29+
- USEARCH Uchime (7.0.1090)
- UPARSE (7.0.1090)
git clone https://github.com/ekopylova/OTU-clustering.git
- Read datasets: ftp://ftp.microbio.me/pub/supplemental_otu_clustering_datasets.tar.gz
- Greengenes 13.8: ftp://ftp.greengenes.microbio.me/greengenes_release/gg_13_5/gg_13_8_otus.tar.gz
- SILVA 111: ftp://ftp.microbio.me/pub/QIIME_nonstandard_referencedb/Silva_111.tgz
- 16S PyNAST template: http://greengenes.lbl.gov/Download/Sequence_Data/Fasta_data_files/core_set_aligned.fasta.imputed
- 18S PyNAST template: ftp://ftp.microbio.me/pub/core_Silva119_alignment.fna.gz
- For chimera checking 16S, the gold database: http://drive5.com/uchime/uchime_download.html
- For chimera checking 18S, the SILVA 97% representative set from SILVA 111 (see 3)
The benchmarking and analysis comparison can be executed using the following scripts (in given order).
Scripts 2-13 require input arguments, all of which are defined in launch_benchmark.sh
.
- Launch full benchmark (executes all scripts below):
OTU-clustering/shell_scripts/launch_benchmark.sh
Otherwise, the user may launch each script individually,
- Simulate even and staggered community reads:
OTU-clustering/shell_scripts/simulate_reads.sh
- Launch all software (via QIIME’s pick_closed_reference_otus.py, pick_de_novo_otus.py and pick_open_reference_otus.py) on 16S datasets:
OTU-clustering/shell_scripts/commands_16S.sh
- Launch all software on 18S datasets:
OTU-clustering/shell_scripts/commands_18S.sh
- Remove singleton OTUs (OTUs consisting of only 1 read) from the final OTU tables generated in steps 2 and 3:
OTU-clustering/python_scripts/run_filter_singleton_otus.py
- Summarize taxonomy using filtered OTU tables:
OTU-clustering/python_scripts/run_summarize_taxa.py
- Summarize filtered OTU tables:
OTU-clustering/python_scripts/run_summarize_tables.py
- Compute true positive, false positive, false negative, precision, recall, F-measure and FP-chimera, FP-known, FP-other metrics using the summarized taxonomy results:
OTU-clustering/python_scripts/run_compute_precision_recall.py
- Generate alpha diversity plots:
OTU-clustering/python_scripts/run_single_rarefaction_and_plot.py
- Generate beta diversity plots:
OTU-clustering/python_scripts/run_beta_diversity_and_procrustes.py
- Generate taxonomy comparison tables:
OTU-clustering/python_scripts/run_compare_taxa_summaries.py
- Generate taxonomy stacked bar plots:
OTU-clustering/python_scripts/run_generate_taxa_barcharts.py
- Plot TP, FP-chimera, FP-known and FP-other results:
OTU-clustering/python_scripts/plot_tp_fp_distribution.py
If you use any of the data or code included in this repository, please cite with the URL: https://github.com/ekopylova/OTU-clustering.git