k-takata/Onigmo

Performance problem with /k/i and /s/i

k-takata opened this issue · 0 comments

Originally reported at kkos/oniguruma#71.

If a pattern is case-insensitive and it contains the letter "k" or "s", the match slows down when encoding is UTF-8.
Onigmo uses different optimization methods for fixed strings. It uses Sunday's quick search with support for case-insensitive search instead of Boyer-Moore search (case-sensitive). However, there is a problem with the case-insensitive search. /s/i matches ſ (U+017F, LATIN SMALL LETTER LONG S) and /k/i matches (U+212A, KELVIN SIGN) also. These characters are 2 or 3 bytes in UTF-8, so the lengths are differ from the original characters. Therefore optimization is turned off.

The actual problem is that if the pattern is /----k/i, the first 4 characters (----) should be used for optimization, however currently Onigmo totally turn of the optimization.

I'm preparing a fix for this.