Recursive descent-parser for Java jdk18+

Introduction

This parser is a part of the compiler for the language "Context". You can watch the progress in the weekly german devlog on Youtube.

Features

This is a linear LL(k)-parser with error-recovery which is able to transform a text into an abstract syntax tree (AST) based on the following rules:

Terminals


Name Short Description
Literal Matches a previously set string.
Section Matches anything between two previously specified characters.
Pattern Matches a specified Regex-Pattern.

NonTerminals


Name Short Description
Ordered Matches a bunch of rules in their specified order.
Unordered Matches all passed rules in any order.
Alteration Matches any one of the passed rules.
Multiple Matches the passed rule as many times as possible.
Optional Tries to match the passed rule or an empty string.
Lazy Matches the passed rule lazily to enable circular dependencies.

Example

This is the EBNF for a simple mathematical expression:

nr  := '-'? ('0' | ... | '9')+
op  := '+' | '-' | '*' | '/'
exp := nr (op nr)*

And this is the implementation with this library (the boolean-literal describes the optionality of the rule):

Rule nr  = new Pattern("-?[0-9]");
Rule op  = new Alteration(false, "+", "-", "*", "/");
Rule exp = new Ordered(nr, new Multiple(true, op, nr));

Now calling exp.tokenize("10 + -20 * 147") will yield the following tree:

"TokenArray": {
  "LiteralToken": "10",
  "TokenList": [
    "TokenArray": {
      "LiteralToken": "+",
      "LiteralToken": "-20"
    },
    "TokenArray": {
      "LiteralToken": "*",
      "LiteralToken": "147"
    }
  ]
}

Meanwhile calling exp.tokenize("10 xxx * 147") will result in:

"TokenArray": {
  "LiteralToken": "10",
  "TokenList": [
    "ErrorToken": "xxx", <-- Error
    "TokenArray": {
      "LiteralToken": "*",
      "LiteralToken": "147"
    }
  ]
}

Dependencies

This project requires: