smarco/WFA2-lib

How to control the primary orientation of the alignment?

y9c opened this issue · 1 comments

y9c commented

I wonder if there is a term in alignment tools to describe the difference as example below. WFA2 show different alignment pattern with other Smith-Waterman alignment tools, such as CSSW. There is two possible alignment output for this query read, and both of them have identical alignment score. CSSW favor opening gap on the leftmost site, and report result as --GT. But WFA2 report result as GT--.
I would like to know if there is a parameter in this library to control this, and make WFA2 consistent with other tools?

CSSW:
ref      TAGTCTGGCACGGTGAAGAGACATGAGAGGTGTAGAATAAGTGGGAGGCCCCCGGCGCCCGGCCCCGTC
         |||||||||||||||||||||||||||||  ||||||||||||||||||||||||||||||||||||||
query    TAGTCTGGCACGGTGAAGAGACATGAGAG--GTAGAATAAGTGGGAGGCCCCCGGCGCCCGGCCCCGTC

WFA2:
ref      TAGTCTGGCACGGTGAAGAGACATGAGAGGTGTAGAATAAGTGGGAGGCCCCCGGCGCCCGGCCCCGTC
         |||||||||||||||||||||||||||||||  ||||||||||||||||||||||||||||||||||||
query    TAGTCTGGCACGGTGAAGAGACATGAGAGGT--AGAATAAGTGGGAGGCCCCCGGCGCCCGGCCCCGTC

Hi,

This is an intrinsic ambiguity of the problem. Both alignments are optimal; the equations are satisfied for both. Due to the greedy nature of the WFA, the gap is "pushed". If you align the reversed sequences, WFA would yield the gap on the left. Similarly, in the traceback of every Smith-Waterman-based algorithm, you can "reorder" the conditionals so "match" is preferred from "gap" (or the other way around).

There is currently no parameter to control this. Note that other tools may not be consistent among them, too (e.g., preferring mismatches to gaps or similar tie-resolution hardcoded decisions).

I hope this answer the question. Let me know.