A Nextflow pipeline for running ART with modifications to support amplicon read simulations and the option to have user supplied depths of each amplicon.
This pipeline uses ART to generate simulated reads from an input fasta file. Given a fasta and primer bed file, the pipeline will generate amplicon-specific reads. Optionally, the user can provide a CSV file specifying individual amplicon depths. If not specified, amplicon reads will be generated in equal proportions based on the supplied depth parameter.
flowchart TD
fasta[fasta directory] --> ampToFasta[convertFastaToAmplicons]
primer_bed[primer.bed] --> ampToFasta[convertFastaToAmplicons]
ampToFasta --> ART[runART] --> fastq[fastq]
amplicon_depths[amplicon_depths.csv] --> VariableART[runArtVariableDepths]
ampToFasta --> VariableART[runArtVariableDepths] --> VariableFastq[fastq with user specified individual amplicon depths]
nextflow run BCCDC-PHL/amplicone -profile conda \
--bed /path/to/primers.bed \
--fasta_dir /path/to/fasta_directory \
--model_R1 /path/to/error_model_R1 \
--model_R2 /path/to/error_model_R2 \
--outdir /path/to/outputs
An up-to-date version of Nextflow is required because the pipeline is written in DSL2. Follow the instructions at https://www.nextflow.io/ to download and install Nextflow.
The repo contains a environment.yml files which automatically build the correct conda env if -profile conda
is specifed in the command.
--cache /some/dir can be specified to have a fixed, shared location to store the conda build for use by multiple runs of the workflow.
Important config options are:
Option | Default | Description |
---|---|---|
vary_amplicon_depths |
false |
Set to true if user is supplying individual amplicon depths |
amplicon_depths |
NO_FILE |
A CSV file containing "amplicon" and "depth" for each amplicon in primer.bed file |
depth |
50 |
Desired depth for reads if not supplying individual amplicon depths |
fragment_mean |
600 |
Mean genomic fragment size |
fragment_sd |
75 |
Standard deviation of genomic fragment size |
read_length |
150 |
Simulated read length |
model_R1 |
NO_FILE |
Error profile of R1 reads |
model_R2 |
NO_FILE |
Error profile of R2 reads |
A subdirectory for each process in the workflow is created in --outdir
.