Unterminated string handling regex can hang lezer-generator due to exponential backtracking
briangough opened this issue · 2 comments
I noticed a problem with lezer-generator
taking a several minutes to report an "Unterminated string literal" error when reading a simple grammar containing an incorrectly terminated string. Profiling the code showed that the regexes used in the parse.ts
next method were subject to exponential backtracking when the grammar file contains backslash characters later in the file.
Specifically the line let end = this.match(start + 1, /^(\\.|[^'])*'/)
at parse.ts#L69 can match a backslash on both sides of the alternation operator, triggering the backtracking. The solution would be to change the regex to /^(\\.|[^'\\])*/
to exclude the backslash from the right hand side. There are three cases in the parse.ts next
method: the unterminated strings "
and '
, and the unterminated character set [
case which have this problem and solution.
Here's a simple test case:
$ time node dist/lezer-generator.cjs stringtest.grammar -o stringtest.js
Unterminated string literal (stringtest.grammar 2:9)
real 2m27.683s
user 2m27.674s
sys 0m0.005s
@tokens {
foo { ' } // unterminated string
}
@top Program {
Foo
}
Foo { foo "a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n a long string\n a long string\n a long
string\n a long string\n"} // each \ triggers regex backtracking
After changing the regex as described above
$ time node dist/lezer-generator.cjs stringtest.grammar -o stringtest.js
Unterminated string literal (stringtest.grammar 2:9)
real 0m0.064s
user 0m0.045s
sys 0m0.021s
Woops, thanks for spotting that. Attached patch should address it.
Thanks for the quick patch.