PanGenomeAnalysisTool: A Python script for pan-genome analysis, generating plots, and statistical insights. Analyze gene presence and absence in multiple genomes effortlessly.
This Python script is designed for analyzing pan-genomes, specifically for estimating the parameters (k) and (\gamma) for the Heap Law equation: (n = \kappa N^\gamma), where (n) is the number of pangenome genes and (N) is the number of genomes. It takes the gene_presence_absence.Rtab
output of the Roary tool as input and provides insights into whether the pan-genome is open or closed based on the estimated (\gamma) value.
- Estimate the parameters (k) and (\gamma) for the Heap Law equation.
- Determine if the pan-genome is open or closed based on the (\gamma) value.
- Generate pan-genome and modified core-genome plots.
- Save statistics and plots as high-resolution images for publication.
-
Installation:
- Ensure you have Python 3.x installed on your system.
- Install required Python packages using pip:
pip install numpy matplotlib scipy
-
Running the Script:
- Clone this repository to your local machine.
- Navigate to the directory containing
pan_genome_analysis.py
.
-
Command-line Usage:
- Run the script using the following command-line arguments:
python pan_genome_analysis.py -f input_file -i iterations -o output_dir
-f
,--input_file
: Path to thegene_presence_absence.Rtab
output file of the Roary tool.-i
,--iterations
: Number of iterations for analysis (default: 10).-o
,--output_dir
: Directory path for output files.
- Run the script using the following command-line arguments:
-
Output:
- The script will generate the following outputs:
- Pan-genome and modified core-genome plots in high-resolution image formats (e.g., PNG, PDF) saved in the specified output directory.
- A text file named
pan_genome_statistics.txt
containing the values of (k), (\gamma), and the pan-genome status (open or closed).
- The script will generate the following outputs:
-
Interpreting Results:
- The script estimates (k) and (\gamma) for the Heap Law equation and determines whether the pan-genome is open or closed based on the (\gamma) value.
-
Test run: To demonstrate the usage of the
pan_genome_analysis.py
script, we provide a test directory with input and output subdirectories with test input and its output files. You can reproduce the analysis using the following command:python pan_genome_analysis.py -f input/gene_presence_absence.Rtab -o output -i 10
This script is designed to process the output files of the Roary tool and analyze pan-genomes. We would like to acknowledge the developers of the Roary tool for their contribution to the field of comparative genomics. The Roary tool is a valuable resource for pan-genome analysis, and its documentation is available at Roary GitHub Repository.
If you are using the pan_genome_analysis.py
script for your research, please consider citing it as follows:
Sharma, V. (2024). pan_genome_analysis.py [Python script]. Retrieved from https://github.com/vsmicrogenomics/PanGenomeAnalysisTool