Escape function returns regex that is incompatible with 'u' flag
raydog opened this issue · 9 comments
The XRegExp.escape() method documentation currently says:
The result can safely be used at any point within a regex that uses any flags.
However, the escape method emits a bad pattern for unicode regexps. This is because whitespace is currently being escaped (Ie: " " becomes "\ ") and unicode regexps don't support that. This currently fails in Node 6 / 8, and AFAICT (since the documentation is fairly dense...) ES2015 Annex B.1.4 seems to say that the regex 'u' flag should disable the \{any character} -> [{any character}] behaviors that this whitespace escaping thing was depending on.
So this code:
var XRegExp = require('XRegExp');
var re = new XRegExp(XRegExp.escape(" "), "u");Will throw this error:
/Users/ray/patt/node_modules/XRegExp/xregexp-all.js:3377
new RegExp(generated.pattern, generated.flags),
^
SyntaxError: Invalid regular expression: /\ /: Invalid escape
at RegExp (<anonymous>)
at XRegExp (.../node_modules/XRegExp/xregexp-all.js:3377:9)
So either the documentation should be updated, or .escape() should stop escaping whitespace. (Maybe only escape it / stop escaping it with a flag?)
Annex B.1.4 does not apply to Unicode RegExps per https://tc39.github.io/ecma262/#sec-regular-expressions-patterns:
However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the
[U]parameter present on the goal symbol.
AFAICT, the relevant spec bit is here: https://tc39.github.io/ecma262/#sec-forbidden-extensions
The RegExp pattern grammars in 21.2.1 and B.1.4 must not be extended to recognize any of the source characters
A-Zora-zasIdentityEscape[U]when theUgrammar parameter is present.
Where is /\ /u disallowed?
I'm probably wrong about the section then, since that's just the section that was referenced in the commit where I thought that functionality was removed. (After a closer look, I think it was just a refactor that touched the line.)
But V8 does very specifically disallow the /\ /u thing:
https://chromium.googlesource.com/v8/v8.git/+/e918c4ec464456a374098049ca22eac2107f6223
specifically, the line:
// With /u, no identity escapes except for syntax characters
// are allowed. Otherwise, all identity escapes are allowed.
if (unicode()) {
return ReportError(CStrVector("Invalid escape"));
}Maybe this is disallowed in 21.2.1 by https://tc39.github.io/ecma262/#prod-IdentityEscape ? AFAICT, the bit that would have allowed \{any source-compatible character} is tagged as [~U], so not available in unicode regexps.
You’re absolutely right — https://tc39.github.io/ecma262/#prod-`IdentityEscape` disallows it. (I somehow misread SyntaxCharacter as SourceCharacter before, which makes no sense.)
BTW the change that introduced this behavior happened back when the spec was still a Word document, so there’s no matching commit 🤦♂️ Here’s some background: https://bugs.ecmascript.org/show_bug.cgi?id=3157#c9
@raydog What change would you propose here? I'm not sure how to best address this, since not escaping whitespace would mean the resulting string can't be safely embedded in an XRegExp regex that uses flag x.
(XRegExp.escape was created years before /u was introduced.)
@slevithan Perhaps, similar to some of the other functions in the library, this function can accept a second flags parameter, and then the function would escape assuming a regexp in that syntax? The "no-argument" case can default to the existing "x" behavior.
Alternatively, a second method can be added for escaping unicode regexps... I think I like the first option more tho.
Just an update on this ticket. This is still failing on the current Node.js LTS (14). I ran into this while converting a Perl script (which used quotemeta) to JS. Are there any plans to address this, either in this library or in ES?
@jameschensmith This is now fixed in e22a52b and will be in the next release (v5.0.0 -- no timeline yet). The fix should work in all cases (including with flags x and u) and does not require providing an additional argument to XRegExp.escape.
@raydog and @jameschensmith, I just released v5.0.0, so you should be able to try it and confirm things are now working.