hoaproject/Regex

Support internal options setting

Hywan opened this issue · 0 comments

See http://pcre.org/pcre.txt.
Quoting:

INTERNAL OPTION SETTING

The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and
PCRE_EXTENDED options (which are Perl-compatible) can be changed from
within the pattern by a sequence of Perl option letters enclosed
between "(?" and ")". The option letters are

i for PCRE_CASELESS
m for PCRE_MULTILINE
s for PCRE_DOTALL
x for PCRE_EXTENDED

For example, (?im) sets caseless, multiline matching. It is also possi-
ble to unset these options by preceding the letter with a hyphen, and a
combined setting and unsetting such as (?im-sx), which sets PCRE_CASE-
LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED,
is also permitted. If a letter appears both before and after the
hyphen, the option is unset.

The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA
can be changed in the same way as the Perl-compatible options by using
the characters J, U and X respectively.

When one of these option changes occurs at top level (that is, not
inside subpattern parentheses), the change applies to the remainder of
the pattern that follows. If the change is placed right at the start of
a pattern, PCRE extracts it into the global options (and it will there-
fore show up in data extracted by the pcre_fullinfo() function).

An option change within a subpattern (see below for a description of
subpatterns) affects only that part of the subpattern that follows it,
so

(a(?i)b)c

matches abc and aBc and no other strings (assuming PCRE_CASELESS is not
used). By this means, options can be made to have different settings
in different parts of the pattern. Any changes made in one alternative
do carry on into subsequent branches within the same subpattern. For
example,

(a(?i)b|c)

matches "ab", "aB", "c", and "C", even though when matching "C" the
first branch is abandoned before the option setting. This is because
the effects of option settings happen at compile time. There would be
some very weird behaviour otherwise.

Note: There are other PCRE-specific options that can be set by the
application when the compiling or matching functions are called. In
some cases the pattern can contain special leading sequences such as
(_CRLF) to override what the application has set or what has been
defaulted. Details are given in the section entitled "Newline
sequences" above. There are also the (_UTF8), (_UTF16),(_UTF32), and
(_UCP) leading sequences that can be used to set UTF and Unicode prop-
erty modes; they are equivalent to setting the PCRE_UTF8, PCRE_UTF16,
PCRE_UTF32 and the PCRE_UCP options, respectively. The (_UTF) sequence
is a generic version that can be used with any of the libraries. How-
ever, the application can set the PCRE_NEVER_UTF option, which locks
out the use of the (*UTF) sequences.