mozack/abra2

Realigned output bam TLEN field plus/minus sign when FLAG == 147

Julie-Zhongyun-Huang opened this issue · 2 comments

Hi there!
We are recently very interested in abra2 for fast and accurate reassembly/realignment of InDels.
When using other tools with the realinged bam from abra2, we discovered this following potential issue. Please see the following example read pair:

A00337:46:HHGVNDMXX:1:1441:31946:25316:CTGCAGTA:CTGCAGTA:GA:AA  147     chr16   3727646 60      139M    =       3727648 139     TTCCTAGATGCCTGGATTTTCAGTACAAAAGGTCCAAGAACATGAAAGGGGAAAGGTGATGCTCTCACAATGCTACAAGCCCTCCACAAACTTCTCTAGCGTGTCCCCCGTGGTGTCCCCGACCAGGGACAGTTCGCTG     :FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF     YA:Z:chr16:3727129:964M MD:Z:48A88      RG:Z:4  NM:i:3  YM:i:2  YO:Z:chr16:3727648:-:2S137M     AS:i:132        XS:i:23 YX:i:3
A00337:46:HHGVNDMXX:1:1441:31946:25316:CTGCAGTA:CTGCAGTA:GA:AA  99      chr16   3727648 60      8S130M  =       3727646 -139    TTTTTATTC
CTAGATGCCTGGATTTTCAGTACAAAAGGTCCAAGAACATGAAAGGGGAAAGGTGATGCTCTCACAATGCTACAAGCCCTCCACAAACTTCTCTAGCGTGTCCCCCGTGGTGTCCCCGACCAGGGACAG      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      YA:Z:chr16:3727129:964M MD:Z:48A81      RG:Z:4  NM:i:1  AS:i:125        XS:i:23

In this example, for the FLAG == 147 read, the POS column (col4, here 3727646) is less than PNEXT (col 8, here 3727648), and the TLEN (col 9, here 139) receives a plus sign.

However, when I check other bam files not realigned/reassembled, in such situation (FLAG == 147 & POS < PNEXT), TLEN is always with minus sign.

According to SAM format specification , for TLEN, the leftmost segment has a plus sign and the rightmost has a minus sign. For FLAG==147 (second of a pair / reverse-complemented), when POS < PNEXT, the segment should still be the rightmost.

Please don't hesitate to let me know if the TLEN sign should be modified.
Thanks a lot!!

Julie

I'm not sure I have a grasp on the issue here.

Based on a quick reading of the SAM spec, I could not anything to support the following statement:

"For FLAG==147 (second of a pair / reverse-complemented), when POS < PNEXT, the segment should still be the rightmost."

Feel free to correct me if I am missing something and point me to where this is defined.

Also, it may be helpful to hear how this is impacting downstream tools your are using with the realigned BAM. Thanks.