By Tim Henderson
Copyright 2014-2017, All Rights Reserved. Made available for public use under the terms of a BSD 3-Clause license.
lexmachine
is a full lexical analysis framework for the Go programming
language. It supports a restricted but usable set of regular expressions
appropriate for writing lexers for complex programming languages. The
framework also supports sub lexers and non-regular lexing through an
"escape hatch" which allows the users to consume any number of further
bytes after a match. So if you want to support nested C-style comments
or other paired structures you can do so at the lexical analysis stage.
This library was written when I was teaching EECS 337 Compiler Design and Implementation at Case Western Reserve University in Fall of 2014. It wrote two compilers one was "hidden" from the students as the language implemented was their project language. The other was tcel which was written initially as an example of how to do type checking. That compiler was later expanded to explain AST interpretation, intermediate code generation, and x86 code generation.
lexmachine
includes the following components
- A parser for restricted set of regular expressions.
- A abstract syntax tree (AST) for regular expressions.
- A backpatching code generator which compiles the AST to (NFA) machine code.
- An NFA simulation based lexical analysis engine. Lexical analysis engines work in a slightly different way from a normal regular expression engine as they tokenize a stream rather than matching one string.
- Match objects which include start and end column and line numbers of the lexemes as well as their associate token name.
- A declarative "DSL" for specifying the lexers.
- An "escape hatch" which allows one to match non-regular tokens by consuming any number of further bytes after the match.
- An in progress (but working) DFA backend which current runs DFA simulation but is intended to support code generation as well.
- A command
lexc
which compiles a sequence of patterns into an NFA. Mostly written to support a homework assignment for the class.
I am just now starting to document this project. My intention is to get it cleaned up with examples and get the DFA backend finished up and merged. I also want to rewrite the frontend to support the full set of regular expressions. I wasn't able to do that initially because of time constraints.