textX/Arpeggio

Detect and report invalid grammars

igordejanovic opened this issue · 0 comments

Recent change [faaeaeb] (See the warning note in the changelog) makes the parser behave consistently but some pathological grammars now can lead the parser to loop endlessly (e.g. nesting Optional inside ZeroOrMore). This is frustrating for the user.

This issue will track the development of checker during parser construction which would catch those pathological cases and report the problem early.

In general, these cases should be checked:

  1. Direct or indirect left recursion. Leads to an infinite loop.
  2. Rules that always succeed (infallible) in OrderedChoice (Optional, ZeroOrMore, RegExMatch that match empty, sequences of the previous). This will not stop the parser from working but will prevent other alternatives to be tried. Thus, this might just be a warning and not a hard error. Or, we could insist on fixing this by e.g. removing unreachable choice alternatives.
  3. Rules that may succeed without consuming input (nullable) inside Zero/OneOrMore as it may lead to an infinite loop.

Algorithms:
Check 1 (left recursion):

  • Do a depth first search over the parser model.
  • For ordered choices follow each branch
  • For sequences follow leftmost rule, if the current rule in a sequence is nullable follow the next one also (hidden left recursion)
  • Keep a path of rules. If current rule was visited before report an error.

Check 2 (infallible in OrderedChoice):

  • Mark each rule during construction with infallible flag (in constructors)
  • Infallible rules are:
    • StringMatch with empty match
    • RegexMatch that can match empty (test by trying to match '')
    • ZeroOrMore
    • Optional
    • Sequence of infallible rules
    • OrderedChoice with an infallible branch
  • If any of the branch, except the last, in OrderedChoice is infallible report an error. Suggest removing unreachable choices (those that follows the infallible) as a fix.

Check 3 (nullable in repetitions):

  • Mark each rule during construction as nullable flag (in constructor):
  • Nullable rules are:
    • StringMatch with empty match
    • RegexMatch that can match empty (test by trying to match '')
    • Syntax predicates (lookaheads) - And and Not
    • Optional
    • ZeroOrMore
    • Sequence of nullable rules
    • OrderedChoice where any of the branch is nullable (some cases would be prevented by check 2, so it should run first)
  • If Zero/OneOrMore contains nullable rule report an error.

Note: These checks introduce an additional overhead. Since, the grammar won't change in production these check should be run only if the parser is in debug mode.