/pegex

Regular PEGs - like regular expressions but prefix notation and possessive behaviour - in C(++)

Primary LanguageC++MIT LicenseMIT

pegexp

Pegular expressions, aka Pegexp, formally "regular PEGs". Possessive regular expressions using prefix operators. Like regular expressions but prefix notation and possessive behaviour. In C(++)

Operators

^
start of the input or start of any line
$
end of the input or the end of any line
.
any character, including a newline
?
Zero or one of the following expression
*
Zero or more of the following expression
+
One or more of the following expression
(expr)
Group subexpressions (does not imply capture)
|A|B...
Either A or B (or ...). Note that the first alternative has a preceding |
&A
Continue only if A succeeds
!A
Continue only if A fails
anychar
match that non-operator character
\char
match the escaped character (including any operator, \0 \b \e \f \n \r \t, and any other char)
\177
match the specified octal character
\xXX
match the specified hexadecimal (0-9a-fA-F)
\x{1-2}
match the specified hexadecimal (0-9a-fA-F)
\u12345
match the specified 1-5 digit Unicode character (only if compiled for Unicode support)
\u{1-5}
match the specified 1-5 digit Unicode character (only if compiled for Unicode support)
[a-z]
Normal character class (a literal hyphen may occur at start)
[^a-z]
Negated character class. Characters may include the \escapes listed above

NOT YET IMPLEMENTED:

{n,m}
match from n (default 0) to m (default unlimited) repetitions of the following expression.
<name>
Call the callout function passing the specified name.
Captures.

Note:

Possessive alternates and possessive repetition will never backtrack. Once an alternate has matched, no subsequent alternative will be tried in that group. Once a repetition has been made, it will never be unwound. It is your responsibility to ensure these possessive operators never match unless it's final. You should use negative assertions to control inappropriate greed.

Example

const char*	text = "abcdeefcdcddcf";
const char*	search = text;
Pegex<>		pegexp("+(!(dc)[a-e][c-f])");

int		length = pegexp.match(search);
if (length >= 0)
	printf("%.*s\n", length, search);

Prints:

bcdeefcdcd

Explanation:

+	One of more repetitions of the following group
(	Start group
!(dc)	Negative assertion: fail (and stop repeating) if we're looking at "dc"
[a-e]	Any character from a to e inclusive
[c-f]	Any character from c to f inclusive
)	end of group

LICENSE

The MIT License. See the LICENSE file for details.