/sc2-illumina-pipeline

Bioinformatics pipeline for SARS-CoV-2 sequencing at CZ Biohub

Primary LanguageNextflowGNU Affero General Public License v3.0AGPL-3.0

SARS-CoV-2 Consensus Genome Pipeline

This pipeline generates consensus SARS-CoV-2 genomes from fastq files. We are using it on the following types of sequencing data:

  1. Metagenomic sequencing enriched for SARS-CoV-2 reads (protocols.io).
  2. Amplicon-based short-read sequencing (using ARTIC v3 protocol).

Typical usage

For generating consensus genomes from reads:

nextflow run czbiohub/sc2-illumina-pipeline -profile artic,docker \
    --reads '[s3://]path/to/reads/*_R{1,2}_001.fastq.gz*' \
    --kraken2_db '[s3://]path/to/kraken2db' \
    --outdir '[s3://]path/to/outdir'

The kraken2db can be downloaded from https://genexa.ch/sars2-bioinformatics-resources/.

Replace -profile artic with -profile msspe if using MSSPE sequencing. See the documentation for more details.

Testing

Simple test to make sure things aren't broken:

nextflow run czbiohub/sc2-illumina-pipeline -profile docker,test

Benchmarking

Simple benchmark (for mapping, not speed). Run after algorithm changes to see how accuracy might be affected. Result in benchmark/call_consensus-stats/combined.stats.tsv

nextflow run czbiohub/sc2-illumina-pipeline --profile docker,benchmark

Documentation

The czbiohub/sc2-illumina-pipeline pipeline comes with documentation about the pipeline, found in the docs/ directory:

  1. Installation
  2. Running the pipeline
  3. Pipeline overview
  4. Output

Acknowledgments

Initial version of this pipeline was based on https://github.com/connor-lab/ncov2019-artic-nf