/berp

A flexible cross-language parser generator with support for languages without explicit tokenization rules (like Gherkin)

Primary LanguageC#Apache License 2.0Apache-2.0

Berp

A flexible cross-language parser generator with support for languages without explicit tokenization rules (like Gherkin).

Installation

It can be installed from NuGet. The executable is within the tools/net471 folder inside the package.

Features

  • generates parser for it's own grammar (the "hello world" for parser generators), see Berp Grammar
  • does not generate a lexer/tokenizer, so ideal for languages where tokenization is easy or anyway not really possible
  • simple, BNF-like grammar definition
  • supports multiple target languages (currently C#, Java, Ruby, JavaScript, Go, Python) with the same grammar (the language generation is specified in template files)
  • allows building AST, with AST-building hooks
  • supports streamed token reading (tokens can be kept attached to the input stream to avoid unnecessary data transfer and object creation)
  • supports context-sensitive tokens, also possible to change the tokenization rules during parsing (e.g. when a #language: no is encountered)
  • supports a special "other" token, that matches to the "anything-else" case, when there is no better match
  • support for recursive grammar rules is limited (it parses them up to a certain level only)
  • simple, look-ahead rules can be specified
  • rules can be marked as production rules to be represented in AST
  • allows capturing ignored content tokens (e.g. comments)

Samples

Supported target languages

  • C# - CSharp.razor
  • Java - Java.razor
  • Ruby - Ruby.razor
  • JavaScript (TypeScript) - TypeScript.razor
  • Go - Go.razor
  • Python - Python.razor

TODO