Uses Brzozowski derivatives to convert combinator-defined regexes to efficient DFAs for matching and recognition.
Following the paper, DFA states are identified with regular expressions, and transitions with the derivative operation. Algebraic rewrite rules are implemented as smart constructors. As in the paper, the rules define an approximate equivalence relation which is used instead of true regex equivalence to reduce the DFA, since true regex equivalence is prohibitively expensive to compute.
To support not just matching but recognition, the library introduces evidence Ev.t
type, a first-class match witness. Every state transition of the DFA is then annotated with an evidence transform (logically, Ev.t -> Ev.t
). The recognizer first runs a matcher through the DFA, recording the state trail. It then threads back the empty evidence through every transform corresponding to a visited state, obtaining a match evidence for the original expression.
The library supports a "parser combinator" interface, similarly to regex-applicative in Haskell.
The long-term goals of this project are:
-
Pedagogical - demonstrate and explain clearly the derivative approach for efficient regular expression matching and recognition
-
Verification - provide a certified regex library, as the mostly pure code should be verifiable in a proof assistant such as Coq
-
Practical - by tweaking the DFA representation there is potential to speed up the library to be on par with
ocamllex
andulex
Current state is experimental. Basic recognizers seem to work but testing is needed.
Pull requests are welcome.