An automaton-based regex implementation for zig.
Note: This is still a work in progress and many things still need to be done.
- Capture group support
- UTF-8 support
- More tests (plus some automated tests/fuzzing)
- Add a PikeVM implementation
- Literal optimizations and just general performance improvements.
const debug = @import("std").debug;
const Regex = @import("regex").Regex;
test "example" {
var re = try Regex.compile(debug.global_allocator, "\\w+");
debug.assert(try re.match("hej") == true);
}
fn compile(a: Allocator, re: []const u8) !Regex
Compiles a regex string, returning any errors during parsing/compiling.
pub fn match(re: *Regex, input: []const u8) !bool
Match a compiled regex against some input. The input must be matched in its entirety and from the first index.
pub fn partialMatch(re: *Regex, input: []const u8) !bool
Match a compiled regex against some input. Unlike match
, this matches the
leftmost and does not have to be anchored to the start of input
.
pub fn captures(re: *Regex, input: []const u8) !?Captures
Match a compiled regex against some input. Returns a list of all matching slices in the regex with the first (0-index) being the entire regex.
If no match was found, null is returned.
pub fn sliceAt(captures: *const Captures, n: usize) ?[]const u8
Return the sub-slice for the numbered capture group. 0 refers to the entire match.
pub fn boundsAt(captures: *const Captures, n: usize) ?Span
Return the lower and upper byte positions for the specified capture group.
We can retrieve the sub-slice using this function:
const span = caps.boundsAt(0)
debug.assert(mem.eql(u8, caps.sliceAt(0), input[span.lower..span.upper]));
See the following useful sources: