/srm

High-performance .NET regex engine with predictable performance

Primary LanguageC#MIT LicenseMIT

Symbolic Regex Matcher (SRM)

SRM is a high-performance regular expression matching engine with predictable performance characteristics. SRM implements a fully compatible subset of the .NET regex language, which mainly omits non-regular features. It provides comparable throughput to popular native libraries, such as RE2, with a pure C# codebase.

SRM combines advanced symbolic reasoning with a regex derivatives based matching approach. For an overview of the theory behind SRM please see: Olli Saarikivi, Margus Veanes, Tiki Wan, Eric Xu. Symbolic Regex Matcher. In TACAS 2019.

Usage

The API mostly follows that of System.Text.RegularExpressions:

using Microsoft.SRM;
...
string input = "Hello World!";
var regex = new Regex(".l*.");
bool hasLs = regex.IsMatch(input); // True
var matches = regex.Matches(input); // list of Match structs for "ello" and "rld"

Building and running tests

The library is built and tested with .NET Core 3.1. To build the project and run the tests run:

dotnet build
dotnet test

Regenerate unicode character tables

SRM uses unicode character tables recovered from the .NET runtime. To regenerate them for a new version of the runtime run:

cd unicode_table_gen
dotnet run ../srm/unicode