-latest -r main/stable
. As a result you must run nextflow drop https://github.com/jhuapl-bio/taxtriage
first. This only applies to pipelines run by calling the remote repo and the previously mentioned parameters. If you expect to make local changes frequently, you should just git clone
and git pull
manually and run the pipeline from the main.nf
file. See here for more info
TaxTriage is a bioinformatics best-practice analysis pipeline for APHL pipeline for triage classification reports.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.
Tax Triage is designed as a pipeline for the purpose of giving an initial triage of taxonomic classifications, using Kraken2 database(s), that can then be ingested into a CLIA-style report format. It is under active development, but in the current state it is capable of running a set number of samples end-to-end using a user-created samplesheet in .csv
format. The output format is a HTML
which is highly interactive and distributable. This pipeline uses the nextflow
ecosystem and is also available as a module in Basestack. Currently, Basestack is undergoing improvements to allow easier usage of nextflow pipelines (includes TaxTriage) that is scheduled for release in early August.
Efforts are underway to provide full support of this pipeline on nf-core to provide a seamless deployment methodology. The pipeline also requires installation of Docker or Singularity (CE ONLY v4+) for the individual modules within it. Because these modules are separate from the source code of TaxTriage, we recommend following the examples outlined in the usage details first to automatically run the pipeline and install all dependencies while also giving you some example outputs and a better feel for how the pipeline operates.
See Here for full usage details
See Here for troubleshooting & FAQ
TaxTriage requires 2 primary installs for it to work
- Nextflow
- Singularity or Docker (recommended)
Follow instructions here or run these commands in your WSL2, Native Linux, or Mac environment
# Make sure that Java v11+ is installed:
java -version
# Install Nextflow
curl -fsSL get.nextflow.io | bash
Note, this command requires sudo to move to your home path. If you are on an HPC, make sure that nextflow is in your $PATH if not globally available
Place it in your $PATH
# Add Nextflow binary to your user's PATH:
mv nextflow ~/bin/
If installing globally, requiring sudo, type:
sudo mv nextflow /usr/local/bin
When complete, verify installation with nextflow -v
to see the version
Choose A (Recommended - Docker) or B. If on a HPC, talk with your IT to get B. Singularity setup. You do NOT need to install both software tools.
Follow these steps for your OS here - IF on WSL2 (Windows), choose Docker Desktop for Windows and it should be available automatically in your WSL environment
Make sure you have either Docker or Singularity installed, as well as Nextflow
This will pull the test data and run the pipeline. It should take ~10-15 minutes.
nextflow run https://github.com/jhuapl-bio/taxtriage -r main -latest -profile test,docker -resume
❗If you want singularity instead, make sure to specify that in the profile instead of docker like: test,singularity
Follow the steps here
- Run the command:
nextflow drop -f https://github.com/jhuapl-bio/taxtriage
nextflow pull https://github.com/jhuapl-bio/taxtriage
cp -r ~/.nextflow/assets/jhuapl-bio/taxtriage ~/taxtriage
cd ~/taxtriage
nextflow drop -f https://github.com/jhuapl-bio/taxtriage
- Running Kraken2 and FASTQC report with the k2_viral db
nextflow run https://github.com/jhuapl-bio/taxtriage \
--outdir tmp_viral \
-resume \
--input examples/Samplesheet.csv \
--taxtab "default" -r main -latest \
--db "viral" --download-db \
-profile local,docker
nextflow run https://github.com/jhuapl-bio/taxtriage \
--input examples/Samplesheet.csv -r main -latest \
--db viral --download_db --skip_assembly \
--outdir tmp --max_memory 10GB --max_cpus 3 \
-profile docker -resume --demux --remove_taxids "9606"
Remember, if you are doing a single taxid, wrap it with '' inside the "" quote
nextflow run https://github.com/jhuapl-bio/taxtriage \
--input examples/Samplesheet.csv \
--db "k2_viral" -r main -latest \
--outdir tmp_viral \
--profile local,docker \
-resume
This will use a local assembly text and reference fasta, assuming the reference FASTA is called refer.fasta
You will need 3 files locally on your system
- assembly
- reference_fasta
- db
nextflow run https://github.com/jhuapl-bio/taxtriage \
--input examples/Samplesheet.csv \
--db "k2_viral" -r main -latest \
--outdir tmp --reference_fasta ./refer.fasta \
-profile local,docker \
-resume \
--demux \
--assembly examples/assembly_summary_refseq.txt
nextflow run https://github.com/jhuapl-bio/taxtriage \
--input examples/Samplesheet_flu.csv \
--db viral --download_db -r main -latest \
--outdir tmp_viral \
-profile local,docker \
--assembly data/databases/flukraken2/library/influenza-fixed.fna --assembly_file_type kraken2 \
-resume
jhuaplbio/taxtriage
repo first!
nextflow run ./main.nf -profile test,docker
If you want to download the databases from scratch, you can see them here
Make sure to Download these databases to your Desktop
or wherever you are the most comfortable. Remember the location and specify the --db
parameter as the absolute path. For example ~/Desktop/flukraken2
. Also, remove the --download-db
parameter
-
Install
Nextflow
(>=21.10.3
) -
Install any of
Docker
,Singularity
(you can follow this tutorial). -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run https://github.com/jhuapl-bio/taxtriage -profile test,docker --outdir ./outdir
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (
YOURPROFILE
in the example command above). You can chain multiple config profiles in a comma-separated string.- The pipeline comes with config profiles called
docker
orsingularity
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
. - Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. - If you are using
singularity
, please use thenf-core download
command to download images first, before running the pipeline. Setting theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
- The pipeline comes with config profiles called
-
Start running your own analysis!
nextflow run https://github.com/jhuapl-bio/taxtriage -r main -latest --outdir test_output -profile <local,docker/singularity>
- Subsample (OPTIONAL)
- Guppyplex (Oxford Nanopore Only)
- QC Plotting part 1 (pycoQC – Oxford Nanopore)
- Trimming (Trimgalore – Illumina, Porechop – Oxford Nanopore)
- Filtering ( Kraken2 – Illlumina, Oxford Nanopore)
- QC Plotting part 2 (FastQC – Illlumina, Nanoplot – Oxford Nanopore)
- Classification ( Kraken2 – Illumina, Oxford Nanopore, Krona Plots)
- Alignment for Stats ( BWAMEM2 – Illumina, Minimap2 – Oxford Nanopore)
⚠️ Currently, the only realignment is going to be based on a taxid call. For example, if there will not be a complete realignment of "order" despite there being multiple species all within that order. For the most part, this is limited to more specific ranks like species, strain, subspecies etc.
- Report Generation ( MultiQC – Illumina, Oxford Nanopore)
TaxTriage was originally written by Brian Merritt, MS Bioinformatics.
We thank the following people for their extensive assistance in the development of this pipeline:
If you would like to contribute to this pipeline, please see the contributing guidelines.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
############################################################################################## Copyright 2022 The Johns Hopkins University Applied Physics Laboratory LLC All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
##############################################################################################
This software tool was supported by the Cooperative Agreement Number NU60OE000104, funded by the Centers for Disease Control and Prevention through the Association of Public Health Laboratories. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention, the Department of Health and Human Services, or the Association of Public Health Laboratories.