SyntaxError: Invalid regular expression: /*/iu: Nothing to repeat
ocavue opened this issue · 4 comments
I get the Invalid regular expression
error when I try to search query .*?
with the following options:
{
unicode: true,
interSplit: "[^\\p{L}\\p{Emoji}\\d']+", // Notice that I add a \p{Emoji} here to supports Emoji
intraSplit: '\\p{Ll}\\p{Lu}',
intraBound: '\\p{L}\\d|\\d\\p{L}|\\p{Ll}\\p{Lu}',
intraChars: "[\\p{L}\\d']",
intraContr: "'\\p{L}{1,2}\\b",
}
Steps to reproduce:
- Open https://stackblitz.com/edit/node-7bqgzj
- Run
node index.js
in the terminal - You will see the error below:
SyntaxError: Invalid regular expression: /*/iu: Nothing to repeat
at new RegExp (<anonymous>)
at prepQuery (file:///home/projects/node-7bqgzj/node_modules/@leeoniya/ufuzzy/dist/uFuzzy.cjs.js:377:11)
at Object.filter (file:///home/projects/node-7bqgzj/node_modules/@leeoniya/ufuzzy/dist/uFuzzy.cjs.js:382:17)
at eval (file:///home/projects/node-7bqgzj/index.js:52:22)
at Generator.next (<anonymous>)
at _0x320919.run (https://node7bqgzj-023r.w-credentialless-staticblitz.com/blitz.41fbae16.js:352:373719)
at _0x28eaf8._evaluate (https://node7bqgzj-023r.w-credentialless-staticblitz.com/blitz.41fbae16.js:352:379291)
at async ModuleJob.run (https://node7bqgzj-023r.w-credentialless-staticblitz.com/blitz.41fbae16.js:181:2372)
I don't know if this is a bug in uFuzzy or an error in my options. If my options is incorrect, I would love to know what's the correct options if I want to support unicode with emoji. Thanks in advance.
I get the
Invalid regular expression
error when I try to search query.*?
with the following options:
btw, i hope you're not expecting for this to work as a regexp injection, cause that is not a supported thing. if you want to do a regexp search, then, well, just do a regexp loop over the haystack :)
i guess the best we can do is run regexp escaping on the post-split tokens. turns out it's kind of meh, since the internal regexp is built up char-by-char, and escaping needs to be done at every location.
My current workaround is removing all special characters from the needle before searching. It seems to work well in our cases so far.
needle = needle.replace(/\p{Punctuation}+/gu, ' ')
const indexes = uf.filter(haystack, needle)
yeah that would work. you can also try doing this instead:
interSplit: "\\p{Punctuation}+",