/wombac

:bangbang: Rapid core genome SNP alignments from multiple bacterial genomes

Primary LanguagePerlGNU General Public License v2.0GPL-2.0

wombac

WARNING! Wombac is no longer supported. All development has moved to Snippy at https://github.com/tseemann/snippy

Synopsis

Wombac rapidly finds core genome SNPs from samples and produces an alignment of those SNPs which can be used to build a phylogenomic tree. It can handle 100s of samples and uses multiple CPUs on a single system efficiently. Computations can re-used for building new trees when new samples are added, saving lots of time. Wombac only looks for substitution SNPs, not indels, and it may miss some SNPs, but it will find enough to build high-resolution trees.

Input

Snippy needs a reference genome in FASTA format (can be in multiple contigs) and a series of samples. A sample can either be:

  • a folder containing FASTQ short reads: eg. R1.fq.fz R2.fq.gz
  • a multi-FASTA file: eg. contigs.fa or NC_273461.fna
  • a .tar.gz file containing FASTA contig files: eg. Ecoli_K12mut.contig.tar.gz (from EBI/NCBI)

Output

Wombac produces standards-compliant / machine-readable output files.

  • a BAM & BAI index for each input sample
  • all.vcf containing the joint multisample variant calls
  • VCF (per sample) and an overall .ALN (FASTA aligned core SNPs).

Etymology

The name Wombac is a combination of bac (for "Bacteria") and Wombat (to represent its Australian origin), which is an animal with a very solid core!

Usage

% ls -R
K12.fna 
EcPoo.fasta 
EHEC.contigs.fa 
UPEC/R1.fq.gz UPEC/R2.fq.gz
EPEC/R1.fastq EPEC/R2.fastq
APEC/s_1_sequence.txt
K12mut.contigs.tar.gz

% wombac --outdir Tree --ref K12.fna --run EcPoo.fasta EHEC.contigs.fa UPEC/ EPEC/ APEC/ K12mut.contigs.tar.gz
(wait a while)

% figtree Tree/core.nex
(play with the ML tree)

% SplitsTree -i Tree/core.aln
(draw different trees from the core SNP alignment)

% less Tree/core.csv
(have a look at the SNP evidence and coordinates)

Excluding samples from the analysis

If your tree has an outlier you do NOT need to re-run Wombac. You can use the 'wombac-core' script to regenerate the output files with different parameters, eg. remove one outlier sample.

Including sites with missing alleles

Run as normal, then use wombac-core with a new --output prefix and the --noncore option.

License

Wombac is free software, released under the GPL (version 3).

Requirements

  • Perl >= 5.6
  • bwa mem >= 0.7.10
  • samtools >= 1.1
  • freebayes => 0.9.20
  • vcflib
  • bgzip, tabix