generator runs out of memory on large grammar
nchen63 opened this issue · 2 comments
I have a large grammar that causes lezer-generator to run of a memory, even running node with --max-old-space-size
of 24 gigs of memory.
I have tried to reduce the number of states by breaking out rules with a large number of optional rules into separate rules, but it hasn't solved the problem. One rule does have a ~600 or-ed tokens, and removing that did cause the grammar build to finish, but I don't know how that can be optimized. I tried breaking that up, but it didn't seem to help.
Can you offer guidance as to how the memory footprint could be reduced (either by modifying the grammar or with a PR)?
The parser-generator algorithm used by this tool definitely has limits, and you'll want to keep your (expanded) grammar size within reasonable bounds. A rule with 600 or-ed tokens sounds like a bad idea. Do these tokens actually serve a different purpose in the grammar, or could you collapse them to a single more generic token type? Optionals and or-expressions are compiled by expanding them into multiple rules (i.e. (a | b) c?
becomes a
, b
, a c
, b c
), and if you combine a bunch of them you can get a serious blowup in the amount of rules.