lezer-parser/lezer

Unterminated string handling regex can hang lezer-generator due to exponential backtracking

briangough opened this issue · 2 comments

I noticed a problem with lezer-generator taking a several minutes to report an "Unterminated string literal" error when reading a simple grammar containing an incorrectly terminated string. Profiling the code showed that the regexes used in the parse.ts next method were subject to exponential backtracking when the grammar file contains backslash characters later in the file.

Specifically the line let end = this.match(start + 1, /^(\\.|[^'])*'/) at parse.ts#L69 can match a backslash on both sides of the alternation operator, triggering the backtracking. The solution would be to change the regex to /^(\\.|[^'\\])*/ to exclude the backslash from the right hand side. There are three cases in the parse.ts next method: the unterminated strings " and ', and the unterminated character set [ case which have this problem and solution.

Here's a simple test case:

$ time node dist/lezer-generator.cjs  stringtest.grammar -o stringtest.js
Unterminated string literal (stringtest.grammar 2:9)

real	2m27.683s
user	2m27.674s
sys	0m0.005s
@tokens {
   foo { ' } // unterminated string
}

@top Program {
     Foo
}

Foo { foo "a long string\n a long string\n a long string\n a long
 string\n a long string\n a long string\n a long string\n a long
 string\n a long string\n a long string\n a long string\n a long
 string\n a long string\n a long string\n a long string\n a long
 string\n a long string\n a long string\n a long string\n a long
 string\n a long string\n a long string\n a long string\n a long
 string\n a long string\n a long string\n a long string\n a long
 string\n a long string\n"} // each \ triggers regex backtracking

After changing the regex as described above

$ time node dist/lezer-generator.cjs  stringtest.grammar -o stringtest.js
Unterminated string literal (stringtest.grammar 2:9)

real	0m0.064s
user	0m0.045s
sys	0m0.021s

Woops, thanks for spotting that. Attached patch should address it.

Thanks for the quick patch.