/pegreg

Compiler that goes from a regular subset of PEG to FSTs.

Primary LanguageCoq

Build Status License Lua Coverage Status

Poster Explaining What PEGREG is

PEGREG

A lua library for compiling a subset of PEGs to FSTs. Requires fst-fast-system (an NFST interpreter) and fst-fast (a lua library wrapping fst-fast-system).

Roadmap

For arbitrary regular expressions or string literals A and B, this library turns A/B, AB, and A* into DFAs quite well. However, this library aims to make A/B and A* possessive, and allow linear-time matching and capture. Therefore, there are still two major items on the roadmap before this library can be made a part of Rosie:

  1. Possessiveness. This will require queries that can find the following:
  • The DFA of an NFA
  • The states that ought to be demoted (their outgoing arrows ignored)
  • The arrows that ought to be ignored
  • The NFA resulting from removing those arrows

And interpreters (AST transformers) that can generate

  • NFAs from each subexpression of the AST
  • DFAs from each sub-nfa
  • B substates from each subexpression
  • B states from each subexpression
  • B subarrows from each subexpression
  • B arrows from each subexpression
  • The possessive NFA that results from removing the forbidden b arrows.
  1. Matching subexpressions This library intends to use Danny Dubé and Marc Feely's method of extracting matches from DFAs, described in these two papers: Efficiently building a parse tree from a regular expression Automatic construction of parse trees for lexemes And implemented in SILex

to do this, we need

  • A backend that can accept push, snoc, and sel, instead of DFSTs operating from chars to char
  • A frontend that emits those instructions instead of DFSTs from char to char

Interpreter Style

Each interpreter in PEGREG's interpreter chain forms a tagless final interpreter.

And the FST uses a version of finite state transducers.