Virus discovery pipeline built upon the Nextflow pipeline framework
Like the bird, raven-nf
is smart and good at finding things (like viruses). Unlike the bird, this pipeline has not been cited by Edgar Allen Poe.
Austin Reid Manny
Nibert Lab, Harvard Medical School
austinmanny@g.harvard.edu
raven-nf
is a virus discovery pipeline for finding viruses in publicly available transcriptome data.
Extension of the original raven
pipeline (writte in Bash, available at github.com/austinreidmanny/raven)
This pipeline simply takes in one or more SRA accession number (for NGS datasets in the NCBI Sequence Read Archive)
and returns a list of viruses and their sequences. All software dependencies are handled by the conda
package manager.
The steps executed by the pipeline are are:
- Log inputs
- Download reads [
NCBI sra-tools
] - Adapter trimming [
TrimGalore!
] - De novo contig assembly [
rnaSPAdes
] - Assembly refinement to map reads to newly assembled contigs [
bwa-mem
] - Taxonomic classification of assemblies [
DIAMOND
] - Translate taxonomy [custom scripts using
JGI-ISF taxonomy server
] - Mapping reads to assemblies to determine coverage values [
bwa-mem
] - Binning [custom code to parse taxonomy results >
seqtk
] - Taxonomic assignment of unassembled reads [
kraken2
] - Visualization of reads-based taxonomic distribution [
krona
] - Print results to user
Download the latest version of raven-nf
at https://github.com/austinreidmanny/raven-nf
Dependencies & requirements are
- Operating system: Linux/macOS
- Nextflow (https://www.nextflow.io)
- Conda (https://docs.conda.io/en/latest/miniconda.html)
- DIAMOND database
- Kraken database
- Krona set up (Conda will install it, but might not put it in your Bash path or initiate the taxonomy database for this)
nextflow run raven.nf --sra "SRR123456"
optional parameters:
--nucl "dna" or "rna" [default = "rna"]
--output "output/dir/" [default="results/"]
--threads # (number of CPUs) [default=4]
--memory # (amount of RAM) [in GB, default=8]
--tempdir "tmp/dir" [default="/tmp/"]
--diamondDB "path/to/db.dmnd" [default = "/n/data1/hms/mbin/nibert/austin/diamond/nr.dmnd"]
--krakenDB "path/to/kraken_db" [default = "/n/data1/hms/mbin/nibert/austin/tools/kraken2/kraken_viruses"]