k-takata/Onigmo

How about making {n,m}+ as a possessive quantifier?

k-takata opened this issue · 4 comments

Currently, {n,m}+ is a possessive quantifier only in Java and Perl syntax.
How about making it as a possessive quantifier also in the default (Ruby) syntax?

nurse commented

It's hard to migrate because it breaks compatibility.

irb(main):001:0> /a{2,5}+b/
=> /a{2,5}+b/
irb(main):002:0> /a{2,5}+b/=~"aaaaaaaaaaaaaaaab"
=> 0
irb(main):003:0> $&
=> "aaaaaaaaaaaaaaaab"

it would be nice for consistency and avoiding surprises.

maybe a possible route to migration could look like this?

  1. make it a SyntaxError, e.g. possessive interval qualifiers are not enabled
  2. everyone who is still updating will remove the effectless + at some point
  3. after a sufficiently long time, enable the feature
patch commented

Here’s an example of how Perl has deprecated regex syntax in the past in order to open the door to new syntax.

perl-5.16.0:

Unescaped literal "{" in regular expressions.

Starting with v5.20, it is planned to require a literal "{" to be escaped, for example by preceding it with a backslash. In v5.18, a deprecated warning message will be emitted for all such uses. This affects only patterns that are to match a literal "{". Other uses of this character, such as part of a quantifier or sequence as in those below, are completely unaffected:

/foo{3,5}/
/\p{Alphabetic}/
/\N{DIGIT ZERO}

Removing this will permit extensions to Perl's pattern syntax and better error checking for existing syntax. See "Quantifiers" in perlre for an example.

perl-5.22.0:

A literal "{" should now be escaped in a pattern

If you want a literal left curly bracket (also called a left brace) in a regular expression pattern, you should now escape it by either preceding it with a backslash ("\{") or enclosing it within square brackets "[{]", or by using \Q; otherwise a deprecation warning will be raised. This was first announced as forthcoming in the v5.16 release; it will allow future extensions to the language to happen.

perl-5.26.0:

Unescaped literal "{" characters in regular expression patterns are no longer permissible

You have to now say something like "\{" or "[{]" to specify to match a LEFT CURLY BRACKET; otherwise, it is a fatal pattern compilation error. This change will allow future extensions to the language.

These have been deprecated since v5.16, with a deprecation message raised for some uses starting in v5.22. Unfortunately, the code added to raise the message was buggy and failed to warn in some cases where it should have. Therefore, enforcement of this ban for these cases is deferred until Perl 5.30, but the code has been fixed to raise a default-on deprecation message for them in the meantime.

Some uses of literal "{" occur in contexts where we do not foresee the meaning ever being anything but the literal, such as the very first character in the pattern, or after a "|" meaning alternation. Thus

qr/{fee|{fie/

matches either of the strings {fee or {fie. To avoid forcing unnecessary code changes, these uses do not need to be escaped, and no warning is raised about them, and there are no current plans to change this.

But it is always correct to escape "{", and the simple rule to remember is to always do so.

See Unescaped left brace in regex is illegal here.

patch commented

PCRE and ICU also support possessive {n,m}+