BioHansel Manuscript Supplementary Data Repository.

This repository contains supplementary data for the BioHansel manuscript organized by section in which the supplementary data is referenced.

The directory structure of this repo follows the structure of the BioHansel manuscript.

Methods

Canonical SNP genotyping schemas

BioHansel scheme development documentation: https://bio-hansel.readthedocs.io/en/readthedocs/user-docs/genotyping_schemes.html#creating-a-genotyping-scheme

BioHansel Schemas

Population structure backbones

BioHansel

Examples of BioHansel output files

BioHansel metadata tables

Source Code

Source code available at https://github.com/phac-nml/biohansel/

Analysis of retrospective outbreaks with BioHansel

BioHansel SNP detection validation

BioHansel and Snippy SNP detection concordance for S. Typhi

See https://github.com/peterk87/nf-biohansel-snippy-comparison/blob/master/main.nf

Contamination Detection

BioHansel and Snippy SNP calling concordance on artificial contamination datasets

Reference genomes sequences used for Snippy analysis

For Python scripts used to extract the detected variants, see https://github.com/jrober84/bio_hansel_benchmarking

Contamination detection at different coverage cut offs

Selecting representative datasets for the five genotyping schemas

Scripts

BioHansel compute performance

See https://github.com/peterk87/nf-biohansel-sra-benchmark for BioHansel and Snippy benchmarking workflow

Selection of datasets and construction of artificial sequencing data

Selection of datasets for additional benchmarking

Generation of artificial sequencing data representing various genome coverage

Reference genomes sequences

Determination of runtime and memory usage

Results and Discussion

BioHansel quickly and accurately genotypes isolates from WGS data

BioHansel results SH outbreaks

SNVPhyl run SH outbreaks

BioNumerics results for SH outbreaks

Outbreak Analysis results

BioHansel’s genotyping results have high concordance with traditional SNP calling workflows

BioHansel detects contamination in synthetic WGS reads

BioHansel’s speed and memory performance compare favourably to traditional SNP-calling pipelines