Detect and report invalid grammars
igordejanovic opened this issue · 0 comments
Recent change [faaeaeb] (See the warning note in the changelog) makes the parser behave consistently but some pathological grammars now can lead the parser to loop endlessly (e.g. nesting Optional
inside ZeroOrMore
). This is frustrating for the user.
This issue will track the development of checker during parser construction which would catch those pathological cases and report the problem early.
In general, these cases should be checked:
- Direct or indirect left recursion. Leads to an infinite loop.
- Rules that always succeed (infallible) in
OrderedChoice
(Optional
,ZeroOrMore
,RegExMatch
that match empty, sequences of the previous). This will not stop the parser from working but will prevent other alternatives to be tried. Thus, this might just be a warning and not a hard error. Or, we could insist on fixing this by e.g. removing unreachable choice alternatives. - Rules that may succeed without consuming input (nullable) inside
Zero/OneOrMore
as it may lead to an infinite loop.
Algorithms:
Check 1 (left recursion):
- Do a depth first search over the parser model.
- For ordered choices follow each branch
- For sequences follow leftmost rule, if the current rule in a sequence is nullable follow the next one also (hidden left recursion)
- Keep a path of rules. If current rule was visited before report an error.
Check 2 (infallible in OrderedChoice
):
- Mark each rule during construction with
infallible
flag (in constructors) - Infallible rules are:
StringMatch
with empty matchRegexMatch
that can match empty (test by trying to match '')ZeroOrMore
Optional
Sequence
of infallible rulesOrderedChoice
with an infallible branch
- If any of the branch, except the last, in
OrderedChoice
is infallible report an error. Suggest removing unreachable choices (those that follows the infallible) as a fix.
Check 3 (nullable in repetitions):
- Mark each rule during construction as
nullable
flag (in constructor): - Nullable rules are:
StringMatch
with empty matchRegexMatch
that can match empty (test by trying to match '')- Syntax predicates (lookaheads) -
And
andNot
Optional
ZeroOrMore
Sequence
of nullable rulesOrderedChoice
where any of the branch is nullable (some cases would be prevented by check 2, so it should run first)
- If
Zero/OneOrMore
contains nullable rule report an error.
Note: These checks introduce an additional overhead. Since, the grammar won't change in production these check should be run only if the parser is in debug mode.