Simplify meeting condition
RagnarGrootKoerkamp opened this issue · 4 comments
I think the DP formula can be modified to use a gap-close cost instead of a gap-open cost. This should benefit the backwards WFA, so that you don't need the o
overlap in the distance from the start and end when doing the meet-in-the-middle step.
I've quickly written this down here: https://research.curiouscoding.nl/posts/affine-gap-close-cost/
For completeness, here's the differences in the cap-close-cost formula (red: removed, green: added)
Using this, the forward and backward version will have a slightly different formula, but that could also be solved by incurring a cost of o/2 both when opening and closing the gap (see the linked post).
I haven't fully thought out the details yet of the stitching with this modified formula, but my feeling is that this should actually just work.
Since your stopping condition already seems to be to fall back to normal WFA once the score drops below 100, the total time spent computing the o
extra layers won't be significant, but this could possibly reduce the theoretical complexity of the meeting condition and simplify code there. (Oh the other hand, you would get more code because you now need different implementations for the forward and backward pass.)
Hi, Ragnar.
Sorry for not getting back to you sooner.
We were commenting on your idea and we think it is correct. WLOG, if you make the forward alignment "gap-open" and the reverse "gap-close" you can skip the o
extra steps as there is no chance to improve the breakpoint score.
Since your stopping condition already seems to be to fall back to normal WFA once the score drops below 100,
The falling-back part of the code was implemented for performance reasons and the simplicity of the prototype. Nevertheless, note that, most likely, the value of 100 is not optimum. I need to tune that.
this could possibly reduce the theoretical complexity of the meeting condition and simplify code there.
On the O(·) asymptotic analysis, the breakpoint detection comes out as a constant cost. On a more fine analysis, you are probably right.
(Oh the other hand, you would get more code because you now need different implementations for the forward and backward pass.)
:-) That is precisely why I'm not very seduced by this idea. If possible, I would avoid coding the WFA-kernels twice.
BTW: Congrats on this https://research.curiouscoding.nl/. Those posts are really nice to read :-)
Thanks ;) It will be my place to dump ideas and ongoing work for the coming time.
this could possibly reduce the theoretical complexity of the meeting condition and simplify code there.
I see this is ambiguous. I meant complexity as in the length of the theorems and code involved in handling the meeting. Indeed, I don't think this affects the runtime complexity.