Hi cjw85,
Closed this issue · 1 comments
Hi cjw85,
I seem to have found another bug. But I am new to these file formats and I might be making a mistake. I am still using the files I included in my zip file, which is why I am posting in this thread.
The reference fasta is
cat dummy.fa
>dummyI AGCTAGCTATCGTTTCTGTGAG >dummyII GGGGGGGGGGTCTCTAACGACCAA >dummyIII CCACACCACACCCACACACCCACACATCAAATCCACACCACACCACACCC TGGGAGCCACCATAACGGCCTTATTG
The bam file without headers is
samtools view dummy.bam
5d10eb9a-aae1-4db8-8ec6-7ebb34d32575 0 dummyI 10 255 8M * 0 8 TCGTTTCT * MM:Z:T+T?,0,0,0,0,0; ML:B:C,4,7,9,8,6 X R:i:97586873 XA:Z:5d10eb9a-aae1-4db8-8ec6-7ebb34d32575_dummyI_9_17_fwd a4f36092-b4d5-47a9-813e-c22c3b477a0c 0 dummyIII 24 255 48M * 0 48 ACATCAAATCCACACCACACCACACCCTGGGAGCCACCATAACGGCCT * M M:Z:T+T?,0,0,0,0,0; ML:B:C,221,242,3,47,239 XR:i:172963337 XA:Z:a4f36092-b4d5-47a9-813e-c22c3b477a0c_dummyIII_23_71_fwd fffffff1-10d2-49cb-8ca3-e8d48979001b 16 dummyII 4 255 21M * 0 21 GGGGGGGTCTCTAACGACCAA * MM:Z:T+T?,5,0,0,0,0; ML:B:C,182,3 ,4,3,3 XR:i:268435455 XA:Z:fffffff1-10d2-49cb-8ca3-e8d48979001b_dummyII_3_36_rev
If you look at the third line of the bam file, you will see that the reverse-complemented entry fffffff1-10d2-49cb-8ca3-e8d48979001b has the sequence GGGGGGGTCTCTAACGACCAA and the modification data MM:Z:T+T?,5,0,0,0,0; ML:B:C,182,3,4,3,3.
The modification string is clearly wrong as once you've skipped 5 thymidines on the reverse complemented strand, there are no thymidines left! Yet, modbampy runs fine as evidenced by my snippets from #20 (comment)
Do you think this is a bug or am I making a mistake? Thanks!
Originally posted by @sathish-t in #20 (comment)
Fix has been implemented in htslib.