leeoniya/uFuzzy

SyntaxError: Invalid regular expression: /*/iu: Nothing to repeat

ocavue opened this issue · 4 comments

I get the Invalid regular expression error when I try to search query .*? with the following options:

{
  unicode: true,
  interSplit: "[^\\p{L}\\p{Emoji}\\d']+", // Notice that I add a \p{Emoji} here to supports Emoji
  intraSplit: '\\p{Ll}\\p{Lu}',
  intraBound: '\\p{L}\\d|\\d\\p{L}|\\p{Ll}\\p{Lu}',
  intraChars: "[\\p{L}\\d']",
  intraContr: "'\\p{L}{1,2}\\b",
}

Steps to reproduce:

  1. Open https://stackblitz.com/edit/node-7bqgzj
  2. Run node index.js in the terminal
  3. You will see the error below:
SyntaxError: Invalid regular expression: /*/iu: Nothing to repeat
    at new RegExp (<anonymous>)
    at prepQuery (file:///home/projects/node-7bqgzj/node_modules/@leeoniya/ufuzzy/dist/uFuzzy.cjs.js:377:11)
    at Object.filter (file:///home/projects/node-7bqgzj/node_modules/@leeoniya/ufuzzy/dist/uFuzzy.cjs.js:382:17)
    at eval (file:///home/projects/node-7bqgzj/index.js:52:22)
    at Generator.next (<anonymous>)
    at _0x320919.run (https://node7bqgzj-023r.w-credentialless-staticblitz.com/blitz.41fbae16.js:352:373719)
    at _0x28eaf8._evaluate (https://node7bqgzj-023r.w-credentialless-staticblitz.com/blitz.41fbae16.js:352:379291)
    at async ModuleJob.run (https://node7bqgzj-023r.w-credentialless-staticblitz.com/blitz.41fbae16.js:181:2372)

I don't know if this is a bug in uFuzzy or an error in my options. If my options is incorrect, I would love to know what's the correct options if I want to support unicode with emoji. Thanks in advance.

interesting, apparently * is an emoji 🤷

image

image

I get the Invalid regular expression error when I try to search query .*? with the following options:

btw, i hope you're not expecting for this to work as a regexp injection, cause that is not a supported thing. if you want to do a regexp search, then, well, just do a regexp loop over the haystack :)

i guess the best we can do is run regexp escaping on the post-split tokens. turns out it's kind of meh, since the internal regexp is built up char-by-char, and escaping needs to be done at every location.

My current workaround is removing all special characters from the needle before searching. It seems to work well in our cases so far.

needle = needle.replace(/\p{Punctuation}+/gu, ' ')
const indexes = uf.filter(haystack, needle)

yeah that would work. you can also try doing this instead:

interSplit: "\\p{Punctuation}+",