An example for mcorr
In this example, we will infer recombination parameters of Helicobacter pylori by using whole-genome sequences of a panel of 401 strains. We will use the gene-by-gene alignments compiled by Thorell et al.
- Follow the instruction to install
mcorr
and download this repository:
cd ~/Downloads
git clone https://github.com/kussell-lab/Helicobacter_pylori_global_population.git
cd Helicobacter_pylori_global_population
- Download the gene-by-gene alignments from https://datadryad.org//resource/doi:10.5061/dryad.8qp4n, and unzip it.
curl -O https://datadryad.org/bitstream/handle/10255/dryad.134969/BIGSdb_gene-by-gene_alignment.xmfa.gz
gunzip -f BIGSdb_gene-by-gene_alignment.xmfa.gz
- For the purpose of demonstration, we will choose 50 strains randomly, and remove those gene sequences with too many gaps (number of gaps > 2% of the total gene length).
python3 random_choose_strains.py BIGSdb_gene-by-gene_alignment.xmfa Helicobacter_pylori.xmfa 50
- We then calculate correlation profiles using
mcorr-xmfa
, and perform fitting usingmcorr-fit
. It takes 5-10 minutes on a normal PC.
mcorr-xmfa Helicobacter_pylori.xmfa Helicobacter_pylori
mcorr-fit Helicobacter_pylori.csv Helicobacter_pylori
- Results:
Helicobacter_pylori_fit_reports.txt
, which shows the fitting results.Helicobacter_pylori_best_fit.svg
, which shows the best fit;Helicobacter_pylori_parameter_histograms.svg
, which shows the distribution of the measured or inferred parameters.