Performance problem with /k/i and /s/i
k-takata opened this issue · 0 comments
Originally reported at kkos/oniguruma#71.
If a pattern is case-insensitive and it contains the letter "k" or "s", the match slows down when encoding is UTF-8.
Onigmo uses different optimization methods for fixed strings. It uses Sunday's quick search with support for case-insensitive search instead of Boyer-Moore search (case-sensitive). However, there is a problem with the case-insensitive search. /s/i
matches ſ
(U+017F, LATIN SMALL LETTER LONG S) and /k/i
matches K
(U+212A, KELVIN SIGN) also. These characters are 2 or 3 bytes in UTF-8, so the lengths are differ from the original characters. Therefore optimization is turned off.
The actual problem is that if the pattern is /----k/i
, the first 4 characters (----
) should be used for optimization, however currently Onigmo totally turn of the optimization.
I'm preparing a fix for this.