An example for mcorr

In this example, we will infer recombination parameters of Helicobacter pylori by using whole-genome sequences of a panel of 401 strains. We will use the gene-by-gene alignments compiled by Thorell et al.

  1. Follow the instruction to install mcorr and download this repository:
cd ~/Downloads
git clone https://github.com/kussell-lab/Helicobacter_pylori_global_population.git
cd Helicobacter_pylori_global_population
  1. Download the gene-by-gene alignments from https://datadryad.org//resource/doi:10.5061/dryad.8qp4n, and unzip it.
curl -O https://datadryad.org/bitstream/handle/10255/dryad.134969/BIGSdb_gene-by-gene_alignment.xmfa.gz
gunzip -f BIGSdb_gene-by-gene_alignment.xmfa.gz
  1. For the purpose of demonstration, we will choose 50 strains randomly, and remove those gene sequences with too many gaps (number of gaps > 2% of the total gene length).
python3 random_choose_strains.py BIGSdb_gene-by-gene_alignment.xmfa Helicobacter_pylori.xmfa 50
  1. We then calculate correlation profiles using mcorr-xmfa, and perform fitting using mcorr-fit. It takes 5-10 minutes on a normal PC.
mcorr-xmfa Helicobacter_pylori.xmfa Helicobacter_pylori
mcorr-fit Helicobacter_pylori.csv Helicobacter_pylori
  1. Results:
    • Helicobacter_pylori_fit_reports.txt, which shows the fitting results.
    • Helicobacter_pylori_best_fit.svg, which shows the best fit;
    • Helicobacter_pylori_parameter_histograms.svg, which shows the distribution of the measured or inferred parameters.