gustf/js-levenshtein

Improve Code Readability

philbarresi opened this issue · 4 comments

I'm going to try my hand at this if I can wrap my head around the code. As it stands, it's fairly hard to understand what on earth is going on.

gustf commented

Nice that you are interested in learning the algorithm. I will try to give you some hints.
It is probably the loop unrolling that are making it harder to understand.
In fact, you can easily take away the whole loop unrolling part and still have a working function.
I.e. removing the loop at line 70 to 90.

for (; x < lb - 3;) {
  bx0 = b.charCodeAt(offset + (d0 = x));
  bx1 = b.charCodeAt(offset + (d1 = x + 1));
  bx2 = b.charCodeAt(offset + (d2 = x + 2));
  bx3 = b.charCodeAt(offset + (d3 = x + 3));
  dd = (x += 4);
  for (y = 0; y < len; y += 2) {
    dy = vector[y];
    ay = vector[y + 1];
    d0 = _min(dy, d0, d1, bx0, ay);
    d1 = _min(d0, d1, d2, bx1, ay);
    d2 = _min(d1, d2, d3, bx2, ay);
    dd = _min(d2, d3, dd, bx3, ay);
    vector[y] = dd;
    d3 = d2;
    d2 = d1;
    d1 = d0;
    d0 = dy;
  }
}

Something else that might be a bit cryptic is that vector is used both for the distances and as a char code cache for string a. Every second is distance and every second is a char code. This is for performance reasons.

Hope this helps a bit

gustf commented

@philbarresi You want some more info about this, just let me know :)

I will probably ask more about it when I have a chance to start working on this. Got blocked on other stuff. The big thing I was trying to understand is what bx_ and other variables were corresponding to, from there I think I can get the rest of it.

gustf commented

Ok, bx* is the char code of string b at position x+0, x+1, x+2, x+3 respectively, cached in variables to calculate all 4 in each iteration of the inner loop.