Matching the end of a string followed by an empty greedy regex and a word boundary (.*\b) fails
jhriggs opened this issue · 2 comments
For context, see https://bugs.ruby-lang.org/issues/13892.
This is a very specific regex failure that occurs when the final character of the string is matched by the end of a pattern that terminates with .*\b
. For example:
"abc" =~ /c.*\b/
"abc" =~ /abc.*\b/
"abc" =~ /\b.*abc.*\b/
In Ruby 1.8.7 and every other language I have tested (perl, pcre, php, javascript, python, go, ...) this matches. With Oniguruma/Onigma, though, it appears that the greedy .*
causes the \b
to fail, though it should match. I have tested this with every version of ruby >= 1.9, the included simple.c code, and php's mb_ereg(). The problem only occurs when the pattern matches at the end of the string being matched (i.e. matching against abcd
or xyzabcdef
works, but abc
or simply c
does not). Based on my non-exhaustive testing, this only occurs with .*\b
; other patterns like .?\b
and specific characters such as d*\b
work as expected.
See also:
- https://regex101.com/r/JBzSic/2 (PHP/PCRE, Javascript, Python, Go)
- http://fiddle.re/gkm4ad (Go, Java, Javascript, .Net, Perl, PHP, Python, XRegExp)
- http://java-regex-tester.appspot.com/regex/04925044-ca95-46c6-bec5-329057c04ab2 (Java)
Corresponding kkos/oniguruma#70 opened for Oniguruma.
Thank you for the report. I imported the fix from Oniguruma.