/mcorr

Inferring bacterial recombination rates from large-scale sequencing datasets.

Primary LanguageGo

mcorr

Using Correlation Profile of mutations to infer the recombination rate from large-scale sequencing data in bacteria.

Requirments

Installation

  1. Install mcorr-xmfa, mcorr-bam, and mcorr-fit from your terminal:
go get -u github.com/kussell-lab/mcorr/cmd/mcorr-xmfa
go get -u github.com/kussell-lab/mcorr/cmd/mcorr-bam
cd $HOME/go/src/github.com/kussell-lab/mcorr/cmd/mcorr-fit
python3 setup.py install

or to install mcorr-fit in local directory (~/.local/bin in Linux or ~/Library/Python/3.6/bin in MacOS):

python3 setup.py install --user
  1. Add $HOME/go/bin and $HOME/.local/bin to your $PATH environment. In Linux, you can do it in your terminal:
export PATH=$PATH:$HOME/go/bin:$HOME/.local/bin

In MacOS, you can do it as follows:

export PATH=$PATH:$HOME/go/bin:$HOME/Library/Python/3.6/bin

We have tested installation in Windows 10, Ubuntu 17.10, and MacOS High Sierra, using Python 3 and Go v1.9.2.

Typical installation time on an iMac is 10 minutes.

Basic Usage

The inference of recombination parameters requires two steps:

  1. Calculate Correlation Profile

    For whole-genome alignments (multiple gene alignments), use mcorr-xmfa:

    mcorr-xmfa <input XMFA file> <output prefix>

    The XMFA files should contain only coding sequences. The description of XMFA file can be found in http://darlinglab.org/mauve/user-guide/files.html. We provide two useful pipelines to generate whole-genome alignments:

    For read alignments, use mcorr-bam:

    mcorr-bam <GFF3 file> <sorted BAM file> <output prefix>

    The GFF3 file is used for extracting the coding regions of the sorted BAM file.

    Both programs will produce two files:

    • a .csv file stores the calculated Correlation Profile, which will be used for fitting in the next step;
    • a .json file stores the (intermediate) Correlation Profile for each gene.
  2. Fit the Correlation Profile using mcorr-fit:

    mcorr-fit <.csv file> <output_prefix>

    It will produce two files:

    • <output_prefix>_best_fit.svg shows the plots of the Correlation Profile, fitting, and residuals;
    • <output_prefix>_fit_reports.txt shows the summary of the fitted parameters;
    • <output_prefix>_fit_results.csv shows the table of fitted parameters;
    • <output_prefix>_parameter_histograms.svg shows the distributions of the fitted parameters.

Examples

  1. Inferring recombination rates of Helicobacter pylori from whole genome sequences of a set of global strains;
  2. Inferring recombination rates of Helicobacter pylori from reads sequenced from a transformation experiment.