smarco/BiWFA-paper

Some small remarks re the preprint introduction

Closed this issue · 6 comments

Here are some remarks I had while reading your paper. I'm bundling them in a single issue. Feel free to disagree with them though, and I may well be wrong myself once or twice; some others of these will probably be more my personal opinion than objectively wrong.

abstract

demand for

I think it should be just demand here

classical pairwise alignment algorithms based on DP are limited by quadratic time and memory

I disagree. E.g. Myers&Miller'88 could be considered 'classic' and uses linear memory, (and quadratic time, but WFA doesn't solve this for a fixed relative error rate.) (edit: i.e. some definition of 'classic' would be good.)

You may be talking about implementations instead of algorithms here, but e.g. edlib I would probably not consider 'classic'.

The recently proposed WFA introduces an efficient algorithm to perform exact alignment in O(ns) time.

Sure, but Ukkonen'85 and Myers'86 already did this 35 years ago. Also, what does 'efficient' mean here?
(Edit: I think for such a claim (especially in abstract) , it would be best to qualify it with 'for gap affine costs'. We're having to make similar conditionals ourselves and it's really annoying, but in the end the clearest.)

1 Mbp long

The results section doesn't actually mention this number, or the actual upperbound on sequence length.

noisy

what kind or error rate and model? 10%/20%, uniform errors or big indels?

introduction

Classical approaches ... these methods often require a matrix

Drop 'often'? depending on your definition of 'classical'. (Later you write 'the DP matrix', assuming its presence anyway.)

multiple variations have been proposed over the years

It would be nice to cite the diagonal-transition algorithms here (i.e. the other ukkonen'85 paper, and myers86), instead of (I think) more review-oriented papers.

'variations': clarify what kind of variations (implementation? algorithm?)

notable optimizatoins include bit parallel techniques (Myers 1986...

Myers '86 doesn't talk about bit parallel at all.

among other methods (ukkonen 85)

Again, I think you want the other Ukkonen85 paper, Algorithms for Approximate String Matching, not Finding approximate patterns in strings, which is about k-approximate-string-matching, not global pairwise alignment.

nonetheless, all these exact methods retain the quadratic requirements of the original DP algorithm

time or space requirements?
Existing diagonal transition already for using O(ns) (subquadratic?) time or O(n) space.

heuristic methods

(personal request:) consider using approximate methods. heuristic is confusing in the context of A*, e.g. https://github.com/eth-sri/astarix

O(n) (myers and miller 88)

More precise would be O(min(n,m) + lg max(n,m)) space usage.

O(s) memory

I'm always a bit on the edge about using less memory than the size of the input, O(n+m). maybe clarify this? You should highlight that the output is compressed, and that return D x m means exactly that (as in CIGAR strings), and not return DDD..DDD, containing m copies of D (which was my first interpretation), since in that case the space requirement would become O(max(n,m)) to store the output alignment.

Regarding the space analysis in section 2.6:
Currently you analize the memory usage of algorithm 1, but don't say anything about algorithm 3.
Myers86 has a nice discussion on this, and also includes the log(s) = o(s) term, and writes that his algorithm uses O(d) working memory.

state-of-the-art

could you give one or two examples already here? It's convenient to see what are considered the best tools at the time of publication when looking back.

general

You mix BiWFA and BiWFA algorithm. I understand that WFA = WaveFront Alignment, but I'm not exactly sure what this means without it being followed by algorithm. It seems like you use just WFA for the theoretical algorithm, but then what does the 'algorithm' suffix mean?

In the discussion you write we expect the BiWFA to enable .... Is this talking about the algorithm? In general this leads (for me) to some confusion between when you mean the (theoretical) algorithm, and when you're talking about your particular implementation, also called BiWFA. (But saying the BiWFA to me implies algorithm, not your implementation.)

edit re terminology:

It would be good if we have consistent usage of the following terms:

  • diagonal-transition method/algorithm: the general idea of the O(ns) (or O(n+s^2)) algorithm. WFA is a diagonal transition method.
  • WFA: either your extension of this concept to gap-affine, or more specifically your implementation
  • WFA algorithm: same as WFA? Or different? Myself for the longest time I understood WFA as WaveFront Algorithm, without Alignment at all.

(Actually we're struggling with the naming problem ourselves -- not sure whether to give a name to the idea/method, the implementation, or both.)

There's also some inconsistency between the WFA and BiWFA papers now:
WFA paper: The wavefront alignment algorithm (WFA)
BiWFA paper: The wavefront alignment (WFA) algorithm

(I will do this in parts -- to much for a single pass)

abstract

demand for
I think it should be just demand here

Correct.

classical pairwise alignment algorithms based on DP are limited by quadratic time and memory
I disagree. E.g. Myers&Miller'88 could be considered 'classic' and uses linear memory, (and quadratic time, but WFA doesn't solve this for a fixed relative error rate.) (edit: i.e. some definition of 'classic' would be good.) You may be talking about implementations instead of algorithms here, but e.g. edlib I would probably not consider 'classic'.

Although I understand your point, I would not consider Myers&Miller'88 "classical". IMHO, Myers&Miller'88 deserves more attention than it gets. I have scarcely seen it used within tools/applications. I often see implemented "classical" banded, antidiagonal-SIMD, or X-drop-based methods that suffer from quadratic time and memory requirements.

I could remove the "Classical" word here, but then it would be completely untrue that "pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory". So, I threw in the "Classical" denomination, thinking about broadly established algorithm/implementations based on computing the DP matrix (or a portion of it, at least).

The recently proposed WFA introduces an efficient algorithm to perform exact alignment in O(ns) time.
Sure, but Ukkonen'85 and Myers'86 already did this 35 years ago. Also, what does 'efficient' mean here?

Ukkonen'85 and Myers'86 are edit-distance based methods, meanwhile WFA is {edit,gap-linear,n-piece gap-affine}. Efficient in the sense that has better complexity than $O(n^2)$ algorithms and outperforms (in practice) other tools/implementations.

(Edit: I think for such a claim (especially in abstract) , it would be best to qualify it with 'for gap affine costs'. We're having to make similar conditionals ourselves and it's really annoying, but in the end the clearest.)

I see your point, but then I don't want to restrict the WFA to "gap-affine" only when it can do much more. I opted for leaving it general/unqualified.

1 Mbp long
The results section doesn't actually mention this number, or the actual upperbound on sequence length.

The maximum sequence length appears in the "Experimental setup" section.

noisy
what kind or error rate and model? 10%/20%, uniform errors or big indels?

No assumption about the error model is made. The average error rate is indicated to be between 5-10%.

introduction

Classical approaches ... these methods often require a matrix
Drop 'often'? depending on your definition of 'classical'. (Later you write 'the DP matrix', assuming its presence anyway.)

Ok.

multiple variations have been proposed over the years
It would be nice to cite the diagonal-transition algorithms here (i.e. the other ukkonen'85 paper, and myers86), instead of (I think) more review-oriented papers.

So, your comment made me realize that the references here are wrong. Here, I want to emphasize that gap-affine alignment algorithms have quadratic complexity. Note that the last paragraph ended by saying "Crucially, gap-affine penalties can be implemented efficiently using additional DP matrices". Because diagonal-transition algorithms are not gap-affine, they don't qualify to be included in this list of references.

'variations': clarify what kind of variations (implementation? algorithm?)

In this context, "variations" can be broadly understood as any method modification in the algorithm, implementation, or anything in between. This paragraph just wants to acknowledge/emphasise the multiple DP-based methods (algorithms and implementations) developed over the years and their limitations. But I can change it to "optimizations", restricting the scope to "improvements over gap-affine implementations".

notable optimizatoins include bit parallel techniques (Myers 1986...
Myers '86 doesn't talk about bit parallel at all.

Right! That reference is misplaced.

among other methods (ukkonen 85)
Again, I think you want the other Ukkonen85 paper, Algorithms for Approximate String Matching, not Finding approximate patterns in strings, which is about k-approximate-string-matching, not global pairwise alignment.

Yes, you are right.

nonetheless, all these exact methods retain the quadratic requirements of the original DP algorithm
time or space requirements?
Existing diagonal transition already for using O(ns) (subquadratic?) time or O(n) space.

Yes, you are right. But, again, diagonal-transition algorithms are not gap-affine.

heuristic methods
(personal request:) consider using approximate methods. heuristic is confusing in the context of A*, e.g. https://github.com/eth-sri/astarix

Believe me; I do prefer "approximate methods". But in the context of bioinformatics, the prefered term is "heuristic", AFAIK.

O(n) (myers and miller 88)
More precise would be O(min(n,m) + lg max(n,m)) space usage.

Well, yeah, but come on! $\log(n)$ is a subdominant term. I don't want to be that picky with such a great paper.

O(s) memory
I'm always a bit on the edge about using less memory than the size of the input, O(n+m). maybe clarify this? You should highlight that the output is compressed, and that return D x m means exactly that (as in CIGAR strings), and not return DDD..DDD, containing m copies of D (which was my first interpretation), since in that case the space requirement would become O(max(n,m)) to store the output alignment.

I know! Not the first "complain"/concern I have received.

(1) As for the input sequences, I can always clarify: "O(s) memory, excluding the storage of the input sequences". As for the output,
(2) The memory use is proportional to the optimal alignment score, $O(s)$, excluding the storage of the input sequences. Also, note that the output alignment only requires storing the position $(i,j)$ for the mismatches, insertions, and deletions (matches can be inferred from the gaps).

How does it sound now?

Regarding the space analysis in section 2.6:
Currently you analize the memory usage of algorithm 1, but don't say anything about algorithm 3.
Myers86 has a nice discussion on this, and also includes the log(s) = o(s) term, and writes that his algorithm uses O(d) working memory.

Here, I have to disagree. For the various recursive calls, the memory required is $O(\log{s})$, $O(2\log{s/2})$, $O(4\log{s/4})$, etc. However, data structures are discarded before entering each recursive call, so the maximum memory use occurs in the outermost call.

Makes sense?

state-of-the-art
could you give one or two examples already here? It's convenient to see what are considered the best tools at the time of publication when looking back.

I don't know which part of the text you refer to. But I guess that the ones benchmarked on the experimental part.

General

You mix BiWFA and BiWFA algorithm. I understand that WFA = WaveFront Alignment, but I'm not exactly sure what this means without it being followed by algorithm. It seems like you use just WFA for the theoretical algorithm, but then what does the 'algorithm' suffix mean?

I know. I agree that WFA = WaveFront Alignment. However, always putting the term "algorithm" at the end becomes weary and tedious very soon. Therefore, we usually use the term "WFA"/"BiWFA" referring to the algorithm. Then, from Section 3 (Results), we evaluate the performance of our implementation of the algorithm, referring to the implementation simply as BiWFA.

Does it feel confusing?

In the discussion you write we expect the BiWFA to enable .... Is this talking about the algorithm?

Yes.

In general this leads (for me) to some confusion between when you mean the (theoretical) algorithm, and when you're talking about your particular implementation, also called BiWFA. (But saying the BiWFA to me implies algorithm, not your implementation.)

Ok. That should be the idea. A better implementation could improve the results, but the algorithm is still the same (i.e., BiWFA).

edit re terminology:
It would be good if we have consistent usage of the following terms:
diagonal-transition method/algorithm: the general idea of the O(ns) (or O(n+s^2)) algorithm. WFA is a diagonal transition method.

Sure. Where should I change or rephrase this?

WFA: either your extension of this concept to gap-affine, or more specifically your implementation

Yes, looks good to me. WFA is an extension of the diagonal-transition algorithms to more general alignment distances (e.g., gap-lineal, gap-affine, ...)

WFA algorithm: same as WFA? Or different? Myself for the longest time I understood WFA as WaveFront Algorithm, without Alignment at all. (Actually we're struggling with the naming problem ourselves -- not sure whether to give a name to the idea/method, the implementation, or both.)

No. I would say WFA = WaveFront Alignment (algorithm). There can be many implementations, like the "WFA2-lib" or the one provided in the repo of this paper for evaluation purposes (unnamed).

There's also some inconsistency between the WFA and BiWFA papers now:
WFA paper: The wavefront alignment algorithm (WFA)
BiWFA paper: The wavefront alignment (WFA) algorithm

I couldn't find the sentence "The wavefront alignment (WFA) algorithm" anywhere. Can you double-check and let me know where it is?

Thanks for the replies, Santiago, much appreciated!

abstract

classical pairwise alignment algorithms based on DP are limited by quadratic time and memory
I disagree. E.g. Myers&Miller'88 could be considered 'classic' and uses linear memory, (and quadratic time, but WFA doesn't solve this for a fixed relative error rate.) (edit: i.e. some definition of 'classic' would be good.) You may be talking about implementations instead of algorithms here, but e.g. edlib I would probably not consider 'classic'.

Although I understand your point, I would not consider Myers&Miller'88 "classical". IMHO, Myers&Miller'88 deserves more attention than it gets. I have scarcely seen it used within tools/applications. I often see implemented "classical" banded, antidiagonal-SIMD, or X-drop-based methods that suffer from quadratic time and memory requirements.

I could remove the "Classical" word here, but then it would be completely untrue that "pairwise alignment algorithms based on dynamic programming are strongly limited by quadratic requirements in time and memory". So, I threw in the "Classical" denomination, thinking about broadly established algorithm/implementations based on computing the DP matrix (or a portion of it, at least).

Yeah I see your point; I can't really come up with a good alternative for 'classical' apart from maybe 'commonly used' or so, since it seems you refer more to what is actually in use than the (sadly somewhat forgotten) older papers. Either way 'classical' is probably fine.

The recently proposed WFA introduces an efficient algorithm to perform exact alignment in O(ns) time.
Sure, but Ukkonen'85 and Myers'86 already did this 35 years ago. Also, what does 'efficient' mean here?

Ukkonen'85 and Myers'86 are edit-distance based methods, meanwhile WFA is {edit,gap-linear,n-piece gap-affine}. Efficient in the sense that has better complexity than O(n2) algorithms and outperforms (in practice) other tools/implementations.

(Edit: I think for such a claim (especially in abstract) , it would be best to qualify it with 'for gap affine costs'. We're having to make similar conditionals ourselves and it's really annoying, but in the end the clearest.)

I see your point, but then I don't want to restrict the WFA to "gap-affine" only when it can do much more. I opted for leaving it general/unqualified.

Since you explicitly say wavefront alignment (WFA) algorithm here in the paper, this sentence really only talks about the gap-affine scoring in $O(ns)$, so being explicit about this is good I think, even though the tool can do more.
In my opinion the sentence as-is is too easily misread as the WFA algorithm being the first to solve any kind of alignment in $O(ns)$. Maybe something along the lines of 'WFA algorithm extended/generalizing the existing $O(ns) (diagonal transition) method to affine costs'?

1 Mbp long
The results section doesn't actually mention this number, or the actual upperbound on sequence length.

The maximum sequence length appears in the "Experimental setup" section.

I only see minimal length 500 kbp and average length 630 kbps (drop the s maybe?). A maximum would be nice to have, since that is likely what determines the actual memory usage.

introduction

multiple variations have been proposed over the years
It would be nice to cite the diagonal-transition algorithms here (i.e. the other ukkonen'85 paper, and myers86), instead of (I think) more review-oriented papers.

So, your comment made me realize that the references here are wrong. Here, I want to emphasize that gap-affine alignment algorithms have quadratic complexity. Note that the last paragraph ended by saying "Crucially, gap-affine penalties can be implemented efficiently using additional DP matrices". Because diagonal-transition algorithms are not gap-affine, they don't qualify to be included in this list of references.

Hmm, yes it makes sense to state the limitations of current algorithms for affine costs in this paragraph. In that case repeating affine it in the first sentence would be good for clarity. The citations of Ukkonen'85 and Myers'99 threw me off, since those two do not mention affine costs at all, while most of the others do seem to mention it. Note that Loving 2014 (Bitpal) mentions affine costs only in future work, but from my understanding it doesn't actually support gap-affine costs.

In general, this paper merges three ideas: affine costs, diagonal-transition, and linear-space alignments, so it makes sense to write the introduction as an overview of current affine-cost aligners since that is the problem you solve, while the others are performance characterics of solutions.

However, you do miss out on a paragraph on previous work on linear-memory alignment this way, and in particular, I think something similar to the following paragraph has a place in the introduction (and I would suggest that you credit all those 3 papers for previously applying the divide&conquer technique):

Linear memory alignments were first introduced by Hirschberg '77,
who used a divide-and-conquer approach for solving LCS with linear
memory. This technique was later used by Myers '86 to solve edit
distance in linear memory by applying it to the diagonal-transition
method (introduced in the same paper [and Ukkonen'85]), and
Myers&Miller '88 applied it to the Smith-Waterman-Gotoh
algorithm (Gotoh'82) to solve gap-affine alignments in linear space. We apply this
method to WFA to obtain gap-affine alignments in linear time.

On Myers&Miller '88: You do cite them currently, but only referring to them as 'current best', not as 'we apply the same technique to a different algorithm'. Also, side note, their introduction applies surprisingly well to WFA/WFAlm having many different memory modes, talking about how many space-saving techniques have been proposed, but the method by Hirschberg is superior, leading to a linear space version of Gotoh's algorithm.

nonetheless, all these exact methods retain the quadratic requirements of the original DP algorithm
time or space requirements?
Existing diagonal transition already for using O(ns) (subquadratic?) time or O(n) space.

Yes, you are right. But, again, diagonal-transition algorithms are not gap-affine.

O(n) (myers and miller 88)
More precise would be O(min(n,m) + lg max(n,m)) space usage.

Well, yeah, but come on! log⁡(n) is a subdominant term. I don't want to be that picky with such a great paper.

I actually think including it shows appreciation and understanding of their method, and it is correct, while $O(n)$ is imprecise (could be mininum or maximum, as conventions as to which is the larger change), and $O(\min(n,m))$ is false.

O(s) memory
I'm always a bit on the edge about using less memory than the size of the input, O(n+m). maybe clarify this? You should highlight that the output is compressed, and that return D x m means exactly that (as in CIGAR strings), and not return DDD..DDD, containing m copies of D (which was my first interpretation), since in that case the space requirement would become O(max(n,m)) to store the output alignment.

I know! Not the first "complain"/concern I have received.

(1) As for the input sequences, I can always clarify: "O(s) memory, excluding the storage of the input sequences". As for the output, (2) The memory use is proportional to the optimal alignment score, O(s), excluding the storage of the input sequences. Also, note that the output alignment only requires storing the position (i,j) for the mismatches, insertions, and deletions (matches can be inferred from the gaps).

How does it sound now?

(1) input: yes SGTM
(2) output: yes, something like this is nice. Maybe say position (and length) of mismatches, insertions, and deletions, or actually what CIGAR and your pseudocode do: length of (runs of) matches, mismatches, insertions, and deletions.

Regarding the space analysis in section 2.6:
Currently you analize the memory usage of algorithm 1, but don't say anything about algorithm 3.
Myers86 has a nice discussion on this, and also includes the log(s) = o(s) term, and writes that his algorithm uses O(d) working memory.

Here, I have to disagree. For the various recursive calls, the memory required is O(log⁡s), O(2log⁡s/2), O(4log⁡s/4), etc. However, data structures are discarded before entering each recursive call, so the maximum memory use occurs in the outermost call.

Makes sense?

I think I missed the outermost call argument before, so I agree with you here. This could be made slightly more explicit by saying before entering a recursive call in Algorithm 3.

state-of-the-art
could you give one or two examples already here? It's convenient to see what are considered the best tools at the time of publication when looking back.

I don't know which part of the text you refer to. But I guess that the ones benchmarked on the experimental part.

You could mention edlib, bitpai, and ksw2 just before the last paragraph of the introduction.

General

You mix BiWFA and BiWFA algorithm. I understand that WFA = WaveFront Alignment, but I'm not exactly sure what this means without it being followed by algorithm. It seems like you use just WFA for the theoretical algorithm, but then what does the 'algorithm' suffix mean?

I know. I agree that WFA = WaveFront Alignment. However, always putting the term "algorithm" at the end becomes weary and tedious very soon. Therefore, we usually use the term "WFA"/"BiWFA" referring to the algorithm. Then, from Section 3 (Results), we evaluate the performance of our implementation of the algorithm, referring to the implementation simply as BiWFA.

Does it feel confusing?

In the evaluation (3.2), it would be clearer to write our implementation [of the BiWFA algorithm] rather than BiWFA at least one or a few times, since the algorithm in itself doesn't have an absolute memory usage. The way it is now does seem to imply that your implementation is actually called BiWFA.

edit re terminology:
It would be good if we have consistent usage of the following terms:
diagonal-transition method/algorithm: the general idea of the O(ns) (or O(n+s^2)) algorithm. WFA is a diagonal transition method.

Sure. Where should I change or rephrase this?

Not sure exactly; just to fix it for future usage ;)

There's also some inconsistency between the WFA and BiWFA papers now:
WFA paper: The wavefront alignment algorithm (WFA)
BiWFA paper: The wavefront alignment (WFA) algorithm

I couldn't find the sentence "The wavefront alignment (WFA) algorithm" anywhere. Can you double-check and let me know where it is?

First mention of WFA in the abstract of BiWFA paper. (Sorry; without The.)


Ok, so to summarize my understanding:

  • WFA = Wavefront Alignment
  • We can say e.g The WFA algorithm has $O(ns)$ runtime complexity.
  • Since that gets repetitive, we can also simply say WFA has $O(ns)$ runtime complexity.

Personally, I'm not a fan of The WFA has $O(ns)$ runtime complexity., which occurs a few times throughout the paper. To me, it's either the [WFA] algorithm or simply the name of the thing, WFA.