/cgmlst

Fork of mlst of tseemann modified for cgmlst. Not very fast when used with organisms with a large number of allelic variants

Primary LanguagePerlGNU General Public License v2.0GPL-2.0

License: GPL v2

cgmlst

Fork of Torsten Seemanns excellent mlst tool modified for cgMLST. Schemes supported are campylobacter, ecoli and Lmono. Others may work as well. Text below was crudely adapted from the readme of mlst

Quick Start

% cgmlst --scheme=ecoli contigs.fa > output.tsv

Installation

Source

% cd $HOME
% git clone https://github.com/aldertzomer/cgmlst.git
% cd cgmlst
% bash getdb.sh #needs wget. May take some time on slow connections. 

Dependencies

  • NCBI BLAST+ blastn
    • You probably have blastn already installed already.
  • Perl modules Moo and List::MoreUtils
    • Debian: sudo apt-get install libmoo-perl liblist-moreutils-perl
    • Redhat: sudo apt-get install perl-Moo perl-List-MoreUtils
    • Most Unix: sudo cpan Moo List::MoreUtils
  • Wget
    • Debian: sudo apt-get install wget

Usage

Simply just give it a genome file in FASTA or GenBank file!

% cgmlst --scheme=ecoli contigs.fa

It returns a tab-separated line containing

  • the filename
  • the ST (sequence type)
  • the allele IDs

Available schemes

To see which PubMLST schemes are supported:

% cgmlst --list

Lmono campylobacter ecoli 

Missing data

cgmlst does not just look for exact matches to full length alleles. It attempts to tell you as much as possible about what it found using the notation below:

Symbol Meaning
n exact intact allele
~n novel allele similar to n
n? partial match to known allele
n,m multiple alleles
- allele missing

Bugs

Please submit via the Github Issues page: https://github.com/aldertzomer/cgmlst/issues

License

GPLv2: https://raw.githubusercontent.com/aldertzomer/cgmlst/master/LICENSE