Benchmark SIGNL

This repo contains the Snakemake pipeline used for the benchmarking exercise 2019 of the special interest group for bioinformaticians in medical microbiology in the Netherlands (SIGBMMNL? usually just SIGNL).

The benchmark entails processing data for 40 Klebsiella pneuomoniae and 40 VRE isolates. Several Dutch institutions will analyse the same dataset and subsequently, methods, results and conclusions will be compared between centres.

Installation and dependencies

The pipeline is meant to run on Linux and has three main dependencies:

Conda
Snakemake
A downloaded Kraken2 database

Conda

(Mini)conda can be installed through https://docs.conda.io/en/latest/miniconda.html#linux-installers.

Snakemake

If Miniconda is installed, install snakemake using:

conda install -c bioconda -c conda-forge snakemake

Kraken2 database

These can be downloaded from https://ccb.jhu.edu/software/kraken2/downloads.shtml. We used the MiniKraken2_v1_8GB database (no human genome included). You can also compile your own database.

Pipeline

We used snakemake version 5.7.1 to run the pipeline.

The pipeline entails several steps, depicted in rulegraph.svg. In short:

Genome assembly using SKESA
SNP alignment using SKA
Snp-dists is used to get SNP counts
QC using Kraken, fastp, Quast, and multiqc
AMR genes identification using ABRicate
Clustering/phylogenetics using iqtree (from ska alignment) and poppunk
MLST mainly for backwards compatibility and as extra check