Why subset-bam is not efficient for splitting BAM file based on barcodes

Question

Why subset-bam is not efficient for splitting BAM file based on barcodes

kulansam opened this issue 3 years ago · 1 comments

Hi,

Thanks for developing a subset-bam software. I would like to split the BAM file (from cell ranger) for each individual cell barcode, which is provided in the filtered_feature matrix folder (barcode.tsv). I have used the following comment in for loop of my code, but it takes more than 6 days for around 4000K cells in multi-threading.

subset-bam_linux --bam filtered_barcodes_sorted.bam --cell-barcodes $line.tsv --cores 15 --out-bam ./filter_cell_individual_bam/$line.bam

Is there any way to speed up this process?

Answer 1 · 2022-05-05T16:20:13.000Z

Hi,

Thanks for developing a subset-bam software. I would like to split the BAM file (from cell ranger) for each individual cell barcode, which is provided in the filtered_feature matrix folder (barcode.tsv). I have used the following comment in for loop of my code, but it takes more than 6 days for around 4000K cells in multi-threading.

subset-bam_linux --bam filtered_barcodes_sorted.bam --cell-barcodes $line.tsv --cores 15 --out-bam ./filter_cell_individual_bam/$line.bam

Is there any way to speed up this process?

What I did is to split the barcode.tsv into many txt files, each barcode is one file. Then you can set up running as a batch.