/zig-regex

A regex implementation for the zig programming language

Primary LanguageZigMIT LicenseMIT

An automaton-based regex implementation for zig.

Note: This is still a work in progress and many things still need to be done.

  • Capture group support
  • UTF-8 support
  • More tests (plus some automated tests/fuzzing)
  • Add a PikeVM implementation
  • Literal optimizations and just general performance improvements.

Usage

const debug = @import("std").debug;
const Regex = @import("regex").Regex;

test "example" {
    var re = try Regex.compile(debug.global_allocator, "\\w+");

    debug.assert(try re.match("hej") == true);
}

Api

Regex

fn compile(a: Allocator, re: []const u8) !Regex

Compiles a regex string, returning any errors during parsing/compiling.


pub fn match(re: *Regex, input: []const u8) !bool

Match a compiled regex against some input. The input must be matched in its entirety and from the first index.


pub fn partialMatch(re: *Regex, input: []const u8) !bool

Match a compiled regex against some input. Unlike match, this matches the leftmost and does not have to be anchored to the start of input.


pub fn captures(re: *Regex, input: []const u8) !?Captures

Match a compiled regex against some input. Returns a list of all matching slices in the regex with the first (0-index) being the entire regex.

If no match was found, null is returned.

Captures

pub fn sliceAt(captures: *const Captures, n: usize) ?[]const u8

Return the sub-slice for the numbered capture group. 0 refers to the entire match.

pub fn boundsAt(captures: *const Captures, n: usize) ?Span

Return the lower and upper byte positions for the specified capture group.

We can retrieve the sub-slice using this function:

const span = caps.boundsAt(0)
debug.assert(mem.eql(u8, caps.sliceAt(0), input[span.lower..span.upper]));

References

See the following useful sources: