/PikaVirus

Primary LanguagePythonMIT LicenseMIT

A workflow for metagenomics.

GitHub Actions CI Status GitHub Actions Linting Status Nextflow

install with bioconda Docker Get help on Slack

Introduction

PikaVirus is a bioinformatics best-practise analysis pipeline for metagenomic analysis following a new approach, based on eliminatory k-mer analysis, followed by assembly and posterior contig-binning.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Quick Start

  1. Install nextflow

  2. Install any of Docker, Singularity or Podman for full pipeline reproducibility (please only use Conda as a last resort; see docs)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run nf-core/pikavirus -profile test,<docker/singularity/podman/conda/institute>

    Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

  4. Start running your own analysis!

    nextflow run nf-core/pikavirus -profile <docker/singularity/podman/conda> --input '*_R{1,2}.fastq.gz'

See usage docs for all of the available options when running the pipeline.

Pipeline Summary

By default, the pipeline currently performs the following:

  • Sequencing quality control (FastQC)
  • Trimming of low-quality regions in the reads (FastP)
  • Trimmed sequences quality control (FastQC)
  • Identification isolation of viral, bacterial, fungal and unknown reads (Kraken2)
  • Assembly of unknow reads (MetaQuast) and mapping against databases (Kaiju) to identify new possible pathogens
  • Selection of suitable viral, bacterial and fungal references from the provided directory (Mash)
  • Alignment of viral, bacterial and fungal reads against reference genomes to ensure the presence of certain organisms (Bowtie2)

Documentation

The nf-core/pikavirus pipeline comes with documentation about the pipeline: usage and output.

Credits

PikaVirus 2.0 was originally written by Guillermo Jorge Gorines Cordero, under supervision of the BU-ISCIII team in Madrid, Spain.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #pikavirus channel (you can join with this invite).

Citations

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link

In addition, references of tools and data used in this pipeline are as follows:

Improved metagenomic analysis with Kraken 2.

Derrick E Wood, Jennifer Lu & Ben Langmead.

Genome biology 2019 Nov 28. doi: 10.1186/s13059-019-1891-0

fastp: an ultra-fast all-in-one FASTQ preprocessor.

Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu.

Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages i884–i890,. doi: 10.1093/bioinformatics/bty560

Fast and sensitive taxonomic classification for metagenomics with Kaiju

Peter Menzel, Kim Lee Ng & Anders Krogh

Nature Communications volume 7, Article number: 11257 (2016). doi 10.1038/ncomms11257

QUAST: quality assessment tool for genome assemblies

Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi & Glenn Tesler

Bioinformatics Volume 29, Issue 8, 15 April 2013, Pages 1072–1075. doi 10.1093/bioinformatics/btt086

Bioconda: sustainable and comprehensive software distribution for the life sciences

Björn Grüning, Ryan Dale, Andreas Sjödin, Brad A. Chapman, Jillian Rowe, Christopher H. Tomkins-Tinch, Renan Valieris, Johannes Köster & The Bioconda Team

Nature Methods volume 15, pages 475–476(2018). doi 10.1038/s41592-018-0046-7

Mash: fast genome and metagenome distance estimation using MinHash

Brian D. Ondov, Todd J. Treangen, Páll Melsted, Adam B. Mallonee, Nicholas H. Bergman, Sergey Koren & Adam M. Phillippy

Genome Biology 17, Article number: 132 (2016). doi 10.1186/s13059-016-0997-x

metaSPAdes: a new versatile metagenomic assembler

Sergey Nurk1, Dmitry Meleshko1, Anton Korobeynikov and Pavel A. Pevzner

Genome Res 27: 824-834 (2017). doi 10.1101/gr.213959.116