Using Correlation Profile of mutations to infer the recombination rate from large-scale sequencing data in bacteria.
- Install
git
from https://git-scm.com; - Install
go
from https://golang.org/doc/install; - Install
python3
from https://www.python.org/ (we found running issues using the default Python in MacOS); - Install
pip3
from https://pip.pypa.io/en/stable/installing/.
- Install
mcorr-xmfa
,mcorr-bam
, andmcorr-fit
from your terminal:
go get -u github.com/kussell-lab/mcorr/cmd/mcorr-xmfa
go get -u github.com/kussell-lab/mcorr/cmd/mcorr-bam
cd $HOME/go/src/github.com/kussell-lab/mcorr/cmd/mcorr-fit
python3 setup.py install
or to install mcorr-fit
in local directory (~/.local/bin in Linux or ~/Library/Python/3.6/bin in MacOS):
python3 setup.py install --user
- Add
$HOME/go/bin
and$HOME/.local/bin
to your$PATH
environment. In Linux, you can do it in your terminal:
export PATH=$PATH:$HOME/go/bin:$HOME/.local/bin
In MacOS, you can do it as follows:
export PATH=$PATH:$HOME/go/bin:$HOME/Library/Python/3.6/bin
We have tested installation in Windows 10, Ubuntu 17.10, and MacOS High Sierra, using Python 3 and Go v1.9.2.
Typical installation time on an iMac is 10 minutes.
The inference of recombination parameters requires two steps:
-
Calculate Correlation Profile
For whole-genome alignments (multiple gene alignments), use
mcorr-xmfa
:mcorr-xmfa <input XMFA file> <output prefix>
The XMFA files should contain only coding sequences. The description of XMFA file can be found in http://darlinglab.org/mauve/user-guide/files.html. We provide two useful pipelines to generate whole-genome alignments:
- from multiple assemblies: https://github.com/kussell-lab/AssemblyAlignmentGenerator;
- from raw reads: https://github.com/kussell-lab/ReferenceAlignmentGenerator
For read alignments, use
mcorr-bam
:mcorr-bam <GFF3 file> <sorted BAM file> <output prefix>
The GFF3 file is used for extracting the coding regions of the sorted BAM file.
Both programs will produce two files:
- a .csv file stores the calculated Correlation Profile, which will be used for fitting in the next step;
- a .json file stores the (intermediate) Correlation Profile for each gene.
-
Fit the Correlation Profile using
mcorr-fit
:mcorr-fit <.csv file> <output_prefix>
It will produce two files:
<output_prefix>_best_fit.svg
shows the plots of the Correlation Profile, fitting, and residuals;<output_prefix>_fit_reports.txt
shows the summary of the fitted parameters;<output_prefix>_fit_results.csv
shows the table of fitted parameters;<output_prefix>_parameter_histograms.svg
shows the distributions of the fitted parameters.