DmitrySoshnikov/regexp-tree

Broken optimize with multiple optional whitespace `\s`

Cherry opened this issue · 4 comments

This was first noticed in sindresorhus/eslint-plugin-unicorn#994

Optimising /\s?\s?/ results in an invalid regexp:

const {optimize} = require('regexp-tree');
const result = optimize(/\s?\s?/);
/*
_ast: {
    type: 'RegExp',
    body: { type: 'Alternative', expressions: [Array] },
    flags: ''
  },
  _source: null,
  _string: '/\\s?{2}/',
  _regexp: null,
  _extra: null
*/

The optimized result is /\s?{2}/, which isn't valid and throws a Invalid regular expression: /\s?{2}/: Nothing to repeat error. I believe \s{0,2} would be the correct optimization here.

On further investigation, this appears to impact a lot of character classes - seemingly any recurring optional character classes. This is specifically caused by the combineRepeatingPatterns transform.

/\w?\w?/ for example, as well as \r, \n, \v, etc.

Thanks for the report, I'll try to take a look sooner in this issue, and would appreciate a PR in case 😄

In fact, to avoid syntax error that should be wrapped into non-capturing group -- /(?:\s?){2}/, at which point I'm not sure if it's a good optimization. The easiest way is just to skip these classes from optimization, or (more complicated) -- to wrap them into the non-capturing group as long as they are repeated more than two times.

EDIT: yeah, \s{0,2} would work and the correct transform.

The quantifiersMerge correctly translates /\s?\s?/ into /\s{0,2}/, but then it's got reverted back to the original since resulting is longer, and then it fails in combineRepeatingPatterns.

So the /\s?\s?\s?/ works fine and is /\s{0,3}/ since it's shorter.

Fixed in 0421d57 and v0.1.22.