- DFA to regex
- Regex intersection, union, negation operation
- DFA compression
- Epsilon free NFA conversion
- Using range to represent transitions
Clone this repo and build it in Xcode
the basic work flow is like this:
- (BOOL)workFlowWithRegex:(NSString *)regex
andInput:(NSString *)input{
// Parsing For Regex, you can cache the parser and scanner if needed
WRLR1Parser *parser = [[WRLR1Parser alloc] init];
WRLanguage *language = [[WRRegexLanguage alloc] init];
WRRegexScanner *scanner = [[WRRegexScanner alloc] init];
parser.language = language;
parser.scanner = scanner;
[parser prepare];
scanner.inputStr = regex;
[parser startParsing];
WRAST *ast = [language astNodeForToken:parser.parseTree];
// build dfa for regex
WRCharRangeNormalizeMapper *mapper = [[WRCharRangeNormalizeMapper alloc] initWithRanges:scanner.ranges];
for (WRCharTerminal *charTerminal in scanner.charTerminals) {
charTerminal.rangeIndexes = [mapper decomposeRangeList:charTerminal.ranges];
};
WRREFAManager *manager = [[WRREFAManager alloc] initWithCharRangeMapper:mapper
ast:ast];
WRREFABuilder *builder = manager.builder;
// [builder printDFA]; print the state
// [builder DFA2Regex]; if you have dfa states, you can transfer it to regex
// you can use the regex to match multiple inputs
return [builder matchWithString:input];
}
you can use this lib to wrap your own regex functions, cache the parser, scanner, dfaManager if needed. For more examples, plz see section 4 and examples in testcases.
- clojure operator : *
e.g. 1* // zero or more char '1' - plus operator : +
e.g. 1+ // one or more char '1' - question operator : ?
e.g. 1? // zero or one char '1'
- negetion operator : /!
e.g. /!1 // the negation of expression "1" - intersection operator : /&
e.g. 1/&2 // the intersection of expression "1" and expression "2" - union operator : /|
e.g. 1/|2 // the union of expression "1" and expression "2"
- alternate operator : |
e.g. 1|2 // new expresssion that is the alternate of expression "1" and expression "2" - concatenate operator : (virtual operator) // when a expression(expr2) is right after another one(expr1), we have a new expression expr formed by expr1 concatenated by expr2
e.g. 12 // new expression that is concatenation of expression "1" and expression "2"
-
single char :any single ASCII char except the former sepecial ones
e.g. 1, 2, a, _, !, %, , & -
escaped char:
- represent the char that sepecail operators use
e.g. /? // char '?'
e.g. /( // char '('
e.g. // // char '/' - a short notion for ragne
e.g. /d //all single digits, from '0' to '9'
e.g. /w //'0' to '9', 'a' to 'z', 'A' to 'Z'
e.g. /a //'a' to 'z',
e.g. /A //A' to 'Z' - others
e.g. /t(tab), /n((new line), /r(carriage return)
- represent the char that sepecail operators use
-
char range :
- a pair of brackets with characters inside
e.g. [abcde] //a|b|c|d|e
e.g. [!.+](!|.|+ notice that special operators here are treated as characters) - a pair of brackets with character-character patterns
e.g. [a-e] //a|b|c|d|e,
e.g. [a-c0-4] //a|b|c|d|0|1|2|3|4 this style is preffered, and some optimizations are done for it
- a pair of brackets with characters inside
- any: . //represent any single char in range
e.g. .. //match any string at length of 2 - parentheses: () //raise the priority of the expression to the highest
e.g. (1|2)* //zero or more alternation of char '1' and char '2'
() > * = + = ? > concatenate > | > \! > & > \|
"1" // a single char '1'
"1*1+1?" // zero or more '1' followed by one or more '1' followed by zero or one '1'
"\!1" // negation of '1'
".*111.*\&\!.*00.*" // intersection of regex1: any string contains "111", and regex2: any string does not contain "00"
"(0|1)*111[01]*\&\![01]*00[01]*" // regex of "(0|1)*111[01]* "intersects with the negation of "[01]*00[01]*"