burgaard/string-algorithms

Erroneous result in longestCommonSubstring

Closed this issue · 2 comments

Here is a test case for longestCommonSubstring, currently failing in version 1.0.20

longestCommonSubstring(["abc - 48h", "abc - 108h", "abc - 168h"])
=> '8h', which is wrong

As a simple demonstration that something is fundamentally wrong, is that simply appending one more string to the aformentioned arguments makes the function return a longer common substring, which defies any logic I've got.

longestCommonSubstring(["abc - 48h", "abc - 108h", "abc - 168h", "abc - 228h"])
=> 'abc - ', looks good to me

Many thanks

I've started to rummage a bit more this morning, and further experimentation suggests there may be a comparison issue somewhere.

The following code

let charcode = 32;
let sep1;
const sep2 = 'D';
const samples = [];
while (charcode <= 126) {
  sep1 = String.fromCharCode(charcode);
  const result = longestCommonSubstring(
    [
      `abc${sep1}ab`,
      `abc${sep2}ab`,
      `abc${sep1}ab`,
    ],
  );
  samples.push([sep1, result[0] === 'abc', charcode]);
  charcode++;
}
console.log(samples.map(xp => `${xp[0]} (${xp[2]}): ${xp[1]}`).join('\n'));

Reveals that the result is wrong as long as the charcode of sep1 is inferior or equal to that of sep2:

(32): false
! (33): false
" (34): false
... 31 more false ...
B (66): false
C (67): false
D (68): false
E (69): true <= hinge
F (70): true
G (71): true
... 52 more true ...
| (124): true
} (125): true
~ (126): true

I've been able to repeat this observation with different values of sep2 ('!', '3', 'T')

Great find and great detective work.