This Python script extracts reads from a BAM file based on specified cell barcodes and creates separate BAM files for each barcode.
- Python 3.x
- pysam library
-
Ensure you have Python 3.x installed on your system.
-
Install the required library:
pip install pysam
Run the script from the command line with the following syntax:
python bam_barcode_extractor.py <input_bam> <barcode_file> <output_directory>
<input_bam>
: Path to the input BAM file<barcode_file>
: Path to a text file containing cell barcodes (one per line)<output_directory>
: Path to the directory where output BAM files will be saved
-
BAM File: A sorted and indexed BAM file containing aligned reads with cell barcode information in the CB tag.
-
Barcode File: A text file with one cell barcode per line. For example:
AAACCTGAGAAACCAT AAACCTGAGAAACCGC AAACCTGAGAAACCCA
The script will create a separate BAM file for each cell barcode in the specified output directory. Each BAM file will be named after its corresponding cell barcode (e.g., AAACCTGAGAAACCAT.bam
).
- The script reads the input BAM file and the barcode file.
- It creates a new BAM file for each unique barcode in the barcode file.
- It then iterates through all reads in the input BAM file.
- For each read, it checks the CB tag (cell barcode).
- If the CB tag matches one of the barcodes from the barcode file, the read is written to the corresponding output BAM file.
- Ensure that your BAM file has the CB tag for cell barcodes.
- The script assumes that the BAM file is sorted and indexed.
- Large BAM files may require significant processing time and memory.