genome/bam-readcount

Only counting multiple aligned reads for one chromosome with "-l *.bed" and "-q 0" parameters

Opened this issue · 5 comments

I have a small test dataset I've created to measure allelic expression of RNA-seq aligned reads. In one of my test examples, I have two very small chromosomes that are identical and test reads that align to both chromosomes. However, when I use bam-readcount with the parameters "-l *.bed" and "-q 0" to count reads that align to a SNP on both chromosomes, bam-readcount only reports reads for one chromosome.

I've performed the same analysis using "bedtools coverage" and the output reported the same coverage for both chromosomes, as I would expect. However, I love how bam-readcount reports reads for each nucleotide at each position and would prefer to keep using bam-readcount. Please let me know if you have any suggestions or if you would like a copy of my test files.

I'm not entirely clear about the details here - could you share a small example bam (even if it's only a handful of reads?) and a command that allows us to reproduce the problem? Thanks!

Hey Chris, thanks for getting back to me so fast. I really appreciate it.

I went ahead and attached all the files needed to run bam-readcount. The command I use is in the command.txt file.

You'll notice in the .sam.txt file that the paired end reads labeled as "Gene2_WT_1-3" align to both Gene2 and Gene2_1 chromosomes. However, the results found in the bam-readcount_output.txt file only report aligned reads for Gene2_1.

Let me know if you have any other questions or concerns for me.

Bam-readcount_Test.zip

Hey Chris, I'm sure you have a million other things going on, but I was wondering if I could get an update from your end. I'm I misinterpreting my results somehow or is there an issue in counting multiple aligned reads that needs to be fixed? If there is a bug, how long do you think it would take to fix it?

Took a quick look at this tonight. Here's a paired end read and it's sam flags:

Gene2_WT_1      99      Gene2_1 191
Gene2_WT_1      147     Gene2_1 191
Gene2_WT_1      355     Gene2   191
Gene2_WT_1      403     Gene2   191

I assume the issue is caused by the fact that the second pair of entries for that read, it is labelled as a non-primary alignment. (see https://broadinstitute.github.io/picard/explain-flags.html)

See this issue, which specifies that they're excluded intentionally

bam-readcount excludes secondary, qc failed and duplicate reads from the resulting counts.

#46 (comment)

There is also an issue dating back several years requesting the ability to use -F syntax to specify flags to include/exclude, but it's not a priority to implement, and if your workflow requires counting at secondary alignments, then bam-readcount might not be the right tool for you right now. Best of luck!

I didn't realize that bam-readcount intentionally excluded secondary alignments. Thanks for helping me out with this Chris and best of luck to you too!