All regular expressions can be converted to a non-deterministic finite automata (NFA). The goal is of this project is to take in some regular expression and generate a NFA to run strings on. Any string over the base alphabet is either in the language specified by the regular expression or it is not.
We would like to take two strings, a specification for a regular expression and an input string. First we construct a NFA from the expression by from the follwing steps:
- Tokenize the regular expression
- Parse the token stream with the precedence perserving grammar below.
- Walk the tree.
Generate atomic NFA's from the leaf nodes and push onto an expression stack.
Any time we exit an operator node (
union
,concat
orstar
), pop existing NFAs from the expression stack and construct a new NFA from its closure property. - Pop the last NFA off the expression stack.
- Run the breadth-first-search
execute
method on the NFA object to make all avaialble transitions from the input string.accept
if we have consumed the entire input string and the current state is accept. Otherwisereject
regex: union;
union: concat (UNION concat)*;
concat: star (CONCAT star)*;
star: paren (STAR)*;
paren: leaf | '('union ')';
leaf: (LETTER | EPSILON | EMPTY_SET);
UNION = 'U';
CONCAT = '&';
STAR = '*';
LETTER = [a-zA-Z];
EPSILON = '\e';
EMPTY_SET = '\0';
0. parentheses '(' ')'
1. kleen star '*'
2. concatenation '&'
3. union 'U'
git clone git@github.com:JustinMeimar/nfa-regex.git
cd nfa-regex
mkdir build && cd build && cmake ..
make
../bin/regex <regular-expression> <input string>
- Grammar and tokenization
- Recursive Descent parser to recognize valid regular expressions
- Respect operator precedence.
- Parse Tree generation
- Generate NFA by walking the Parse Tree
- execute strings on NFA
- Union closure
- Concatenation closure
- Star closure
- Automated testcases
- Web GUI on website