Broken optimize with multiple optional whitespace `\s`
Cherry opened this issue · 4 comments
This was first noticed in sindresorhus/eslint-plugin-unicorn#994
Optimising /\s?\s?/
results in an invalid regexp:
const {optimize} = require('regexp-tree');
const result = optimize(/\s?\s?/);
/*
_ast: {
type: 'RegExp',
body: { type: 'Alternative', expressions: [Array] },
flags: ''
},
_source: null,
_string: '/\\s?{2}/',
_regexp: null,
_extra: null
*/
The optimized result is /\s?{2}/
, which isn't valid and throws a Invalid regular expression: /\s?{2}/: Nothing to repeat
error. I believe \s{0,2}
would be the correct optimization here.
On further investigation, this appears to impact a lot of character classes - seemingly any recurring optional character classes. This is specifically caused by the combineRepeatingPatterns
transform.
/\w?\w?/
for example, as well as \r
, \n
, \v
, etc.
Thanks for the report, I'll try to take a look sooner in this issue, and would appreciate a PR in case 😄
In fact, to avoid syntax error that should be wrapped into non-capturing group -- /(?:\s?){2}/
, at which point I'm not sure if it's a good optimization. The easiest way is just to skip these classes from optimization, or (more complicated) -- to wrap them into the non-capturing group as long as they are repeated more than two times.
EDIT: yeah, \s{0,2}
would work and the correct transform.
The quantifiersMerge
correctly translates /\s?\s?/
into /\s{0,2}/
, but then it's got reverted back to the original since resulting is longer, and then it fails in combineRepeatingPatterns
.
So the /\s?\s?\s?/
works fine and is /\s{0,3}/
since it's shorter.
Fixed in 0421d57 and v0.1.22
.