Only counting multiple aligned reads for one chromosome with "-l *.bed" and "-q 0" parameters
Adglink opened this issue · 5 comments
I have a small test dataset I've created to measure allelic expression of RNA-seq aligned reads. In one of my test examples, I have two very small chromosomes that are identical and test reads that align to both chromosomes. However, when I use bam-readcount with the parameters "-l *.bed" and "-q 0" to count reads that align to a SNP on both chromosomes, bam-readcount only reports reads for one chromosome.
I've performed the same analysis using "bedtools coverage" and the output reported the same coverage for both chromosomes, as I would expect. However, I love how bam-readcount reports reads for each nucleotide at each position and would prefer to keep using bam-readcount. Please let me know if you have any suggestions or if you would like a copy of my test files.
I'm not entirely clear about the details here - could you share a small example bam (even if it's only a handful of reads?) and a command that allows us to reproduce the problem? Thanks!
Hey Chris, thanks for getting back to me so fast. I really appreciate it.
I went ahead and attached all the files needed to run bam-readcount. The command I use is in the command.txt file.
You'll notice in the .sam.txt file that the paired end reads labeled as "Gene2_WT_1-3" align to both Gene2 and Gene2_1 chromosomes. However, the results found in the bam-readcount_output.txt file only report aligned reads for Gene2_1.
Let me know if you have any other questions or concerns for me.
Hey Chris, I'm sure you have a million other things going on, but I was wondering if I could get an update from your end. I'm I misinterpreting my results somehow or is there an issue in counting multiple aligned reads that needs to be fixed? If there is a bug, how long do you think it would take to fix it?
Took a quick look at this tonight. Here's a paired end read and it's sam flags:
Gene2_WT_1 99 Gene2_1 191
Gene2_WT_1 147 Gene2_1 191
Gene2_WT_1 355 Gene2 191
Gene2_WT_1 403 Gene2 191
I assume the issue is caused by the fact that the second pair of entries for that read, it is labelled as a non-primary alignment. (see https://broadinstitute.github.io/picard/explain-flags.html)
See this issue, which specifies that they're excluded intentionally
bam-readcount excludes secondary, qc failed and duplicate reads from the resulting counts.
There is also an issue dating back several years requesting the ability to use -F syntax to specify flags to include/exclude, but it's not a priority to implement, and if your workflow requires counting at secondary alignments, then bam-readcount might not be the right tool for you right now. Best of luck!
I didn't realize that bam-readcount intentionally excluded secondary alignments. Thanks for helping me out with this Chris and best of luck to you too!