epi2me-labs/modbam2bed

Hi cjw85,

Closed this issue · 1 comments

cjw85 commented

Hi cjw85,

I seem to have found another bug. But I am new to these file formats and I might be making a mistake. I am still using the files I included in my zip file, which is why I am posting in this thread.

The reference fasta is

cat dummy.fa
>dummyI
AGCTAGCTATCGTTTCTGTGAG
>dummyII
GGGGGGGGGGTCTCTAACGACCAA
>dummyIII
CCACACCACACCCACACACCCACACATCAAATCCACACCACACCACACCC
TGGGAGCCACCATAACGGCCTTATTG

The bam file without headers is

samtools view dummy.bam
5d10eb9a-aae1-4db8-8ec6-7ebb34d32575    0       dummyI  10      255     8M      *       0       8       TCGTTTCT        *       MM:Z:T+T?,0,0,0,0,0;    ML:B:C,4,7,9,8,6   X
R:i:97586873    XA:Z:5d10eb9a-aae1-4db8-8ec6-7ebb34d32575_dummyI_9_17_fwd
a4f36092-b4d5-47a9-813e-c22c3b477a0c    0       dummyIII        24      255     48M     *       0       48      ACATCAAATCCACACCACACCACACCCTGGGAGCCACCATAACGGCCT        *  M
M:Z:T+T?,0,0,0,0,0;     ML:B:C,221,242,3,47,239 XR:i:172963337  XA:Z:a4f36092-b4d5-47a9-813e-c22c3b477a0c_dummyIII_23_71_fwd
fffffff1-10d2-49cb-8ca3-e8d48979001b    16      dummyII 4       255     21M     *       0       21      GGGGGGGTCTCTAACGACCAA   *       MM:Z:T+T?,5,0,0,0,0;    ML:B:C,182,3
,4,3,3  XR:i:268435455  XA:Z:fffffff1-10d2-49cb-8ca3-e8d48979001b_dummyII_3_36_rev

If you look at the third line of the bam file, you will see that the reverse-complemented entry fffffff1-10d2-49cb-8ca3-e8d48979001b has the sequence GGGGGGGTCTCTAACGACCAA and the modification data MM:Z:T+T?,5,0,0,0,0; ML:B:C,182,3,4,3,3.

The modification string is clearly wrong as once you've skipped 5 thymidines on the reverse complemented strand, there are no thymidines left! Yet, modbampy runs fine as evidenced by my snippets from #20 (comment)

Do you think this is a bug or am I making a mistake? Thanks!

Originally posted by @sathish-t in #20 (comment)

cjw85 commented

Fix has been implemented in htslib.