This script allows you to run Freebayes in parallel, using multiple cores to process each line of a BED file. Each region specified in the BED file is processed in a separate core, and you can specify the number of cores to use as well as the base name for the output files.
- Python 3
- Required Python Packages:
- subprocess
- multiprocessing
- time
- Freebayes
- tabix
- bgzip
- bcftools
- A BED file specifying the regions to analyze
- An indexed reference genome file
- A list of BAM files names/paths (One per line)
-
Clone the repository:
git clone https://github.com/marceelrf/freebayes_multcore.git cd freebayes_multcore
-
Ensure FreeBayes is installed and available in your PATH:
sudo apt-get install freebayes
-
Prepare your BED file, reference genome file, and BAM list file.
-
Modify the script to specify the paths to your files and the number of cores to use:
BED_FILE = 'path/to/your_bed_file.bed' REF = 'path/to/your_reference.fasta' BAM_LIST = 'path/to/your_bam_list.txt' OUTPUT_BASE_NAME = 'output' # Change this to your desired base name NUM_CORES = 4 # Adjust this to the number of cores you want to use
-
Run the script:
python freebayes_multicores.py
This project is licensed under the MIT License. See the LICENSE file for details.
- Fork the repository
- Create your feature branch (git checkout -b feature/fooBar)
- Commit your changes (git commit -am 'Add some fooBar')
- Push to the branch (git push origin feature/fooBar)
- Create a new Pull Request
- Freebayes
- Python
- multiprocessing module
For any questions or suggestions, please open an issue or contact me at [marcel.ferreira@unesp.br].