k-takata/Onigmo

Matching the end of a string followed by an empty greedy regex and a word boundary (.*\b) fails

jhriggs opened this issue · 2 comments

For context, see https://bugs.ruby-lang.org/issues/13892.

This is a very specific regex failure that occurs when the final character of the string is matched by the end of a pattern that terminates with .*\b. For example:

"abc" =~ /c.*\b/
"abc" =~ /abc.*\b/
"abc" =~ /\b.*abc.*\b/

In Ruby 1.8.7 and every other language I have tested (perl, pcre, php, javascript, python, go, ...) this matches. With Oniguruma/Onigma, though, it appears that the greedy .* causes the \b to fail, though it should match. I have tested this with every version of ruby >= 1.9, the included simple.c code, and php's mb_ereg(). The problem only occurs when the pattern matches at the end of the string being matched (i.e. matching against abcd or xyzabcdef works, but abc or simply c does not). Based on my non-exhaustive testing, this only occurs with .*\b; other patterns like .?\b and specific characters such as d*\b work as expected.

See also:

Corresponding kkos/oniguruma#70 opened for Oniguruma.

Thank you for the report. I imported the fix from Oniguruma.