Pegular expressions, aka Pegexp, formally "regular PEGs". Possessive regular expressions using prefix operators. Like regular expressions but prefix notation and possessive behaviour. In C(++)
- ^
- start of the input or start of any line
- $
- end of the input or the end of any line
- .
- any character, including a newline
- ?
- Zero or one of the following expression
- *
- Zero or more of the following expression
- +
- One or more of the following expression
- (expr)
- Group subexpressions (does not imply capture)
- |A|B...
- Either A or B (or ...). Note that the first alternative has a preceding |
- &A
- Continue only if A succeeds
- !A
- Continue only if A fails
- anychar
- match that non-operator character
- \char
- match the escaped character (including any operator, \0 \b \e \f \n \r \t, and any other char)
- \177
- match the specified octal character
- \xXX
- match the specified hexadecimal (0-9a-fA-F)
- \x{1-2}
- match the specified hexadecimal (0-9a-fA-F)
- \u12345
- match the specified 1-5 digit Unicode character (only if compiled for Unicode support)
- \u{1-5}
- match the specified 1-5 digit Unicode character (only if compiled for Unicode support)
- [a-z]
- Normal character class (a literal hyphen may occur at start)
- [^a-z]
- Negated character class. Characters may include the \escapes listed above
- {n,m}
- match from n (default 0) to m (default unlimited) repetitions of the following expression.
- <name>
- Call the callout function passing the specified name.
- Captures.
Possessive alternates and possessive repetition will never backtrack. Once an alternate has matched, no subsequent alternative will be tried in that group. Once a repetition has been made, it will never be unwound. It is your responsibility to ensure these possessive operators never match unless it's final. You should use negative assertions to control inappropriate greed.
const char* text = "abcdeefcdcddcf";
const char* search = text;
Pegex<> pegexp("+(!(dc)[a-e][c-f])");
int length = pegexp.match(search);
if (length >= 0)
printf("%.*s\n", length, search);
Prints:
bcdeefcdcd
Explanation:
+ One of more repetitions of the following group
( Start group
!(dc) Negative assertion: fail (and stop repeating) if we're looking at "dc"
[a-e] Any character from a to e inclusive
[c-f] Any character from c to f inclusive
) end of group
The MIT License. See the LICENSE file for details.