This repository contains supplementary data for the BioHansel manuscript organized by section in which the supplementary data is referenced.
The directory structure of this repo follows the structure of the BioHansel manuscript.
BioHansel scheme development documentation: https://bio-hansel.readthedocs.io/en/readthedocs/user-docs/genotyping_schemes.html#creating-a-genotyping-scheme
- Appendix I Additional info BioHansel schema development and adaptation.docx
- Table S1 Backbone positions 5 schemes.xlsx
- Schema 1 SH_scheme_v0.5.0.fasta
- Schema 2 SE_scheme_v1.0.7.fasta
- Schema 3 ST_scheme_v0.5.5.fasta
- Schema 4 Typhi_scheme_v1.2.0.fasta
- Schema 5 tb_speciation_scheme_v1.0.5.fasta
- Fig S1a Enteritidis backbone v1.0.7.pdf
- Fig S1b Heidelberg backbone v0.5.0.pdf
- Fig S1c Typhimurium backbone v0.5.5.pdf
- Fig S1d Typhi backbone v1.2.0.pdf
- Fig S1e MTB backbone v1.0.5.pdf
- Caption trees Fig S1a-e.txt
- Table S2a BioHansel Simple Summary genome NC_011083 tech results.tab
- Table S2b BioHansel Results Summary genome NC_011083 results.tab
- Table S2c BioHansel Detailed K-mer Results genome NC_011083 match results.tab
- Table S2d BioHansel Simple Summary unassembled reads SH outbreaks tech results.tab
- Table S2e BioHansel Results Summary unassembled reads SH outbreaks results.tab
- Table S2f BioHansel Detailed K-mer Results unassembled reads SH outbreaks match results.tab
- Table S3a tb_speciation_scheme_v1.0.5_metadata_Coll_et_al_genotypes.txt
- Table S3b Typhi_scheme_v1.2.0_metadata_Wong_et_al_genotypes.txt
Source code available at https://github.com/phac-nml/biohansel/
- Table S4 SH Outbreaks Bekal and coll.tab
- Reference genome for SNVPhyl analysis of SH outbreaks: SH_ref_SL476_NC_011083.1.fasta
See https://github.com/peterk87/nf-biohansel-snippy-comparison/blob/master/main.nf
- Table S5 List of 1000 Salmonella Typhi Accessions.txt
- Worksheet 1 Detailed parameter settings for all programs used for validation
- Raw Nextflow workflow results for Jupyter Notebook
- Jupyter Notebook to tabulate NextFlow workflow results
- HTML version of Jupyter Notebook to tabulate NextFlow workflow results
- Table S6 List of genomes used to generate artificial contamination datasets
- Worksheet 2 Bash and Seqtk Commands.xlsx
- AE006468.2_ref_ST_LT2.fasta
- CP012921.1_ref_SH.fasta
- NC_000962.3_ref_MTB_H37Rv.fasta
- NC_003198.1_ref_Typhi_CT18.fasta
- NC_011294.1_ref_SE_P125109.fasta
For Python scripts used to extract the detected variants, see https://github.com/jrober84/bio_hansel_benchmarking
- Table S7 List of ST accessions for generation of contaminated datasets.txt
- Table S8 List of SE accessions for generation of contaminated datasets.txt
- Table S9 List of Typhi accessions for generation of contaminated datasets.txt
- Table S10 List of MTB accessions for generation of contaminated datasets.txt
- Table S11 List of SH accessions for generation of contaminated datasets.txt
- Script 1 create_reads.sh
- Script 2 contaminate.sh
- Appendix II Additional info Generation of artificially contaminated datasets.pdf
See https://github.com/peterk87/nf-biohansel-sra-benchmark for BioHansel and Snippy benchmarking workflow
- Table S13 List of 1000 Salmonella Enteritidis strains.tab
- Table S14 List of 1000 Salmonella Heidelberg strains.tab
- Table S15 List of 1000 Salmonella Typhi strains.tab
- Table S16 List of 1000 Salmonella Typhimurium strains.tab
- AE006468.2_ref_ST_LT2.fasta
- CP012921.1_ref_SH.fasta
- NC_000962.3_ref_MTB_H37Rv.fasta
- NC_003198.1_ref_Typhi_CT18.fasta
- NC_011294.1_ref_SE_P125109.fasta
- Jupyter Notebook to tabulate NextFlow workflow results
- HTML version of Jupyter Notebook
- NextFlow workflow output file
- Table S17a BioHansel tech results SH outbreaks.txt
- Table S17b BioHansel results SH outbreaks.txt
- Table S17c BioHansel match results SH outbreaks.txt
- Figure S2 BioNumerics Dendrogram
- Table S18 BioNumerics wgMLST similarity matrix.xlsx
- Table S19 BioNumerics wgMLST character data.xlsx
- Table S4 SH Outbreaks Bekal and coll.tab
- Table S20 Heidelberg backbone tree sequences and genome positions BioHansel scheme v0.5.0.txt
- Table S21 SH outbreaks SNV distance matrix.txt
- Table S22 BioHansel vs Snippy Coverage Comparison 1000 Typhi datasets
- Table S23 Genotyping results for BioHansel vs Snippy Accuracy Benchmarking Typhi.txt
- Table S24 Genotyping results summary for BioHansel vs Snippy Accuracy Benchmarking Typhi.txt
- Table S25 Metadata for 1910 Typhi strains.txt
- Table S26 BioHansel Results for 1910 Typhi strains.txt
- Appendix IV Analysis of the 16 Typhi samples that failed BioHansel QC.pdf
- Figure S3 Comparison of base detection results in synthetically contaminated datasets between BioHansel and Snippy.pdf
- Table S27 Comparison base detection in mixed genomes for BioHansel and Snippy.txt
- Table S28 Comparison of base detection at each target SNP position in mixed genomes for BioHansel and Snippy.xlsx
- Table S29 Results Summary per scheme for comparison base detection in mixed genomes for BioHansel and Snippy.xlsx
- Table S30 Results Summary comparison base detection in mixed genomes BioHansel and Snippy.pdf
- Table S31 BioHansel QA&QC results with synthetic contaminated SE datasets.xlsx
- Table S32 BioHansel QA&QC results with synthetic contaminated SH datasets.xlsx
- Table S33 BioHansel QA&QC results with synthetic contaminated Typhi datasets.xlsx
- Table S34 BioHansel QA&QC results with synthetic contaminated ST datasets.xlsx
- Table S35 BioHansel QA&QC results with synthetic contaminated MTB datasets.xlsx