A Nextflow pipeline for running the ARTIC network's fieldbioinformatics tools (https://github.com/artic-network/fieldbioinformatics), with a focus on monkeypox virus (mpxv).
This pipeline is based on the BCCDC-PHL/ncov2019-artic-nf pipeline, which is a fork of the connor-lab/ncov2019-artic-nf pipeline. It also includes freebayes-based variant calling and additional QC filters, initially introduced in jts/ncov2019-artic-nf. It has been modified to support analysis of monkeypox virus.
flowchart TD
ref[ref.fa]
composite_ref[composite_ref.fa]
primers[primer.bed]
primer_pairs[primer_pairs.tsv]
fastq[fastq_dir]
fastq --> normalizeDepth(normalizeDepth)
composite_ref --> performHostFilter
normalizeDepth(normalizeDepth) --> performHostFilter(performHostFilter)
performHostFilter(performHostFilter) --> readTrimming(readTrimming)
readTrimming(readTrimming) --> filterResidualAdapters(filterResidualAdapters)
filterResidualAdapters --> readMapping(readMapping)
ref --> readMapping(readMapping)
readMapping(readMapping) --> trimPrimerSequences(trimPrimerSequences)
primers --> trimPrimerSequences(trimPrimerSequences)
trimPrimerSequences(trimPrimerSequences) --> callConsensusFreebayes(callConsensusFreebayes)
callConsensusFreebayes(callConsensusFreebayes) --> alignConsensusToReference(alignConsensusToReference)
ref --> alignConsensusToReference
alignConsensusToReference --> consensusAlignment[consensus.aln.fa]
trimPrimerSequences --> makeQCCSV(makeQCCSV)
callConsensusFreebayes --> makeQCCSV
callConsensusFreebayes --> consensus[consensus.fa]
callConsensusFreebayes --> variants[variants.vcf]
ref --> makeQCCSV
primers --> makeQCCSV
primer_pairs --> makeQCCSV
makeQCCSV --> qcCSV(qc.csv)
makeQCCSV --> depthPNG(depth.png)
nextflow run BCCDC-PHL/mpxv-artic-nf -profile conda \
--prefix "output_file_prefix" \
--bed /path/to/primers.bed \
--ref /path/to/ref.fa \
--primer_pairs_tsv /path/to/primer_pairs_tsv \
--composite_ref /path/to/human_and_mpxv_composite_ref \
--directory /path/to/reads \
--outdir /path/to/outputs
An up-to-date version of Nextflow is required because the pipeline is written in DSL2. Following the instructions at https://www.nextflow.io/ to download and install Nextflow should get you a recent-enough version.
The repo contains a environment.yml files which automatically build the correct conda env if -profile conda
is specifed in the command. Although you'll need conda
installed, this is probably the easiest way to run this pipeline.
--cache /some/dir can be specified to have a fixed, shared location to store the conda build for use by multiple runs of the workflow.
Important config options are:
Option | Default | Description |
---|---|---|
normalizationTargetDepth |
200 |
Target depth of coverage to normalize to prior to alignment |
normalizationMinDepth |
5 |
Minimum depth of coverage to normalize to prior to alignment |
keepLen |
50 |
Length of reads to keep after primer trimming |
qualThreshold |
20 |
Sliding window quality threshold for keeping reads after primer trimming |
varMinFreqThreshold |
0.25 |
Allele frequency threshold for ambiguous variant |
varFreqThreshold |
0.75 |
Allele frequency threshold for unambiguous variant |
varMinDepth |
10 |
Minimum coverage depth to call variant |
By default, sequence depth will be normalized using bbnorm
to the value specified by the --normalizationTargetDepth
param (default: 200). To skip depth normalization, add the --skip_normalize_depth
flag.
A script to do some basic QC is provided in bin/qc.py
. It measures the % of reference bases are covered by varMinDepth
, and the longest stretch of consensus sequence with no N
bases. This script does not make a QC pass/fail call.
A subdirectory for each process in the workflow is created in --outdir
. A nml_upload
subdirectory containing dehosted fastq files and consensus sequences is included.