Microbiome analysis with QIIME2
-
Install
nextflow
, either directly or withconda
-
Install one of
docker
,singularity
, orconda
-
In a new directory, write your configurations to the file
nextflow.config
. See Example config. -
Execute the workflow
nextflow run -resume Liulab/qiime2-nf
For more details on running nextflow
, run nextflow help
.
Using slurm
Simply tell nextflow
to use the slurm
profile. Multiple profiles can be
specified together, separated by a comma. For example:
nextflow run -resume Liulab/qiime2-nf -profile docker,slurm
On KSU's Beocat, singularity
and slurm
are available. It's recommended to use the provided profiles:
nextflow run -resume Liulab/qiime2-nf -profile singularity,slurm
The input is expected to be a pair of gziped fastq for the forward and reverse read sequences, and a text file listing the samples barcode.
By default, the pipeline will look for fastq files with R1
and R2
in their
name under the data/
subdirectory (relative to the current working directory).
It will also look for any file with the .txt
extension to use as the barcodes file.
You can either prepare your data according to this default or change where to
find the data with the reads
and barcode
parameters. See Optional pipeline
parameters.
The barcode file contain 4 tab-separated columns. These are the string
barcode
, the forward sequence, the reverse sequence, and the sample ID,
respectively. For example:
barcode GGATCGTAATAC GATTATCGACGA 1AA
barcode GGTTATTTGGCG GTCGTGTAGCCT 1AB
barcode CGTGATCCGCTA ATCGCACAGTAA 1AE
There are two steps in the pipeline that require parameters based on the results
of previous steps. Initially, when these parameters are unspecified, the
pipeline will halt, reporting an error. This is expected, just add the required parameter(s) and
rerun the pipeline (make sure to specify -resume
). The pipeline will proceed
from where it stopped previously.
See "Moving Pictures" tutorial
for how to pick an appropriate value for these parameters. The relevant output files will be found in v11nDir
(default: out/visualization
), and rawDir
(default: out/raw-data
).
Two parameters are required for this step
truncF
is the position at which the forward sequences should be truncated due to a drop-off in quality.truncR
is the position at which the reverse sequences should be truncated due to a drop-off in quality.
One parameter (samplingDepth
) is required during the alpha and beta diversity
analysis process. This is the total frequency that each sample should be
rarefied to prior to computing the diversity metrics.
Some of the underlying QIIME2 plugins also take optional parameters. Below is a list of the parameter name to use for each step and where to find documentation for these params.
- demultiplexing:
demuxExtra
cutadapt docs - summarizing demultiplexed sequences:
demuxSumExtra
demux docs - denoising:
denoiseExtra
dada2 docs - visualizing the denoisig stats:
visualizeDenoiseStatsExtra
metadata docs - building phylogenetic trees:
phylogenyExtra
phylogeny docs - classifying taxonomy:
taxonomyClassificationExtra
feature-classifier docs - visualizing taxonomy:
visualizeTaxonomyExtra
metadata docs
reads
: wildcard pattern to look for input reads (default:"data/*R{1,2}*.fastq{,.gz}"
)barcode
: pattern to look for the barcode file (default:"data/*.txt"
)classifier
: ML model to use for taxonomy classification (default:https://data.qiime2.org/2019.7/common/gg-13-8-99-515-806-nb-classifier.qza
)prefix
: prefix to add to output files (default: name of the current working directory)outdir
: output directoryv11nDir
: QIIME2 visualizations directory (default:${outdir}/visualization
)rawDir
: raw data from QIIME2 artifacts directory (default:${outdir}/raw-data
)
Some tasks in the pipeline can make use of multiple cores. These are tagged with
the label multithreaded
inside the main pipeline script. By default, these
processes run with a single core. In order to allocate more cores, add the
following snippet to nextflow.config
, replacing 4
with the number of cores to
use.
process { withLabel: multithreaded { cpus = 4 } }
/* nextflow.config */
params {
// Customize where to find the input
reads = "sample-r{1,2}.fq.gz"
barcode = "barcode.txt"
// Customize prefix for output files
prefix = "my-awesome-project"
classifier = "https://data.qiime2.org/2019.7/common/silva-132-99-nb-classifier.qza"
// Extra parameter for qiime
taxonomyClassificationExtra = "--p-confidence 0.8"
truncF = 200
truncR = 210
samplingDepth = 5000
}
process {
// Set processes tagged with multithreaded to use more threads
withLabel: multithreaded {
cpus = 12
}
// Customize resource for a specific step
withName: buildPhylogeneticTrees {
memory = 32.GB
}
}
// Change where to save conda environment(s)
conda.cacheDir = "/home/bob/nextflow-conda-envs"
// Send a notification email when the pipeline terminates
notification.enabled = true
notification.to = "bob@example.com"
See Configuration for more details.
QIIME2 produces two types of output artifact
(.qza
) and visualization
(.qzv
). You can either interact with these files using the QIIME2 provided
CLI, and
artifact API.
Underneath, these files are really just zip archives that contains the data and
some additional metadata/information. You can use unzip
to unpack the file and
inspect its content directly.
Nextflow fails to download file from an https source with the error javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failure
- Download the Java Cryptography Extension(JCE) zip file for Java 8 from here.
- Uncompress the downloaded archive
- Move
local_policy.jar
andUS_export_policy.jar
to$JAVA_HOME/jre/lib/security
. If there is nojre/
subdirectory under$JAVA_HOME/
, move to$JAVA_HOME/lib/security
instead
qiime2-nf was originally written by Ha Le and Jake Carlson.