/vircov

Viral genome coverage evaluation for metagenomic diagnostics :drop_of_blood:

Primary LanguageRustApache License 2.0Apache-2.0

vircov

Automated coverage statistics, genome recovery and subtyping for metagenomic diagnostics of viral infections from reads or alignments.

Overview

v1.0.0

Install

Anaconda installation with dependencies:

mamba create -n vircov -c esteinig vircov

Dependencies for full pipeline from reads:

  • bowtie2 | minimap2 | strobealign
  • samtools
  • ivar

Source installation without dependencies:

git clone https://github.com/esteinig/vircov
cd vircov && cargo build --release 

Usage

Virus detection and whole genome recovery


Virus population graphs and genomic neighbor typing


Reference databases and subtyping schemes

We make automatically updated subtyping databases and parsed genotype annotation schemes available for a range of common viral pathogen, at least where data-sharing arrangements make this possible (e.g. NCBI- but not GISAID™-derived assemblies). Scheme extractions and release updates are checked and if necessary corrected by our bioinformatics team at the Victorian Infectious Diseases Reference Laboratory (VIDRL) in Melbourne.

Concepts

Low-abundance infections and coverage assessment for detection from reads

Definitive viral diagnosis from metagenomic clinical samples can be extremely challenging due to low sequencing depth, large amounts of host reads and low infectious titres.

One way to distinguish a positive viral diagnosis is to look at alignment coverage against one or multiple reference sequences. When only few reads map to the reference - and genome coverage is low - positive infections often display multiple distinct alignment regions, as opposed to reads mapping to a single or few regions on the reference. De Vries et al. (2021) summarize this concept succinctly in this figure (adapted):

devries

Positive calls in these cases can be made from coverage plots showing the distinct alignment regions and a minimum threshold on the number of regions is chosen by the authors (> 3). Vircov computes the number of distinct alignment regions as part of the genome recovery module.

Genomic neighbor typing using viral population graphs


Etymology

Not a very creative abbreviation of "virus coverage" but the little spectacles in the logo are a reference to Rudolf Virchow. His surname is pronounced like vircov if you mumble the terminal v.