cs-au-dk/dk.brics.automaton

Ignore-case (?i) Flag Support in RegExp

DirkToewe opened this issue · 2 comments

The ignore-case flag is one JDK/Perl regex feature that is badly missing from dk.brics.automaton. Since the question mark is a reserved character, it should hopefully not break existing regular expressions? If it would, it could be made optional, hidden behind a RegExp constructor flag.

If that flag-style is acceptable, I would be willing to take a shot at implementing it.

The easiest is probably to add an automaton conversion operation (in SpecialOperations.java) that, for example, converts from lower-case to upper-case in all transitions, and then apply a similar transformation to the strings used in matching.
You are most welcome to add such an automata operation.
I'm not sure why the question mark is relevant in this. (Note that the question mark is already used in the regexp notation.)

Also note that the package is intentionally not JDK/Perl regex compliant - see the first item in the FAQ: https://www.brics.dk/automaton/faq.html

Okay, I will start with adding a special operation.

The JDK allows You to do something along the lines of:

Pattern.compile("(?i)Hello(?-i)World")

Which makes the first part case-insensitive and the second part sensitive. IMHO this is much more concise and less error-prone than:

new RegExp("[Hh][Ee][Ll][Ll][Oo]World")

I realize and appreciate the choice not to support all the JDK/Perl regular expression syntax. But this is the one feature I am very badly missing.