lezer-parser/lezer

generator runs out of memory on large grammar

nchen63 opened this issue · 2 comments

I have a large grammar that causes lezer-generator to run of a memory, even running node with --max-old-space-size of 24 gigs of memory.

I have tried to reduce the number of states by breaking out rules with a large number of optional rules into separate rules, but it hasn't solved the problem. One rule does have a ~600 or-ed tokens, and removing that did cause the grammar build to finish, but I don't know how that can be optimized. I tried breaking that up, but it didn't seem to help.

Can you offer guidance as to how the memory footprint could be reduced (either by modifying the grammar or with a PR)?

The parser-generator algorithm used by this tool definitely has limits, and you'll want to keep your (expanded) grammar size within reasonable bounds. A rule with 600 or-ed tokens sounds like a bad idea. Do these tokens actually serve a different purpose in the grammar, or could you collapse them to a single more generic token type? Optionals and or-expressions are compiled by expanding them into multiple rules (i.e. (a | b) c? becomes a, b, a c, b c), and if you combine a bunch of them you can get a serious blowup in the amount of rules.