SRM is a high-performance regular expression matching engine with predictable performance characteristics. SRM implements a fully compatible subset of the .NET regex language, which mainly omits non-regular features. It provides comparable throughput to popular native libraries, such as RE2, with a pure C# codebase.
SRM combines advanced symbolic reasoning with a regex derivatives based matching approach. For an overview of the theory behind SRM please see: Olli Saarikivi, Margus Veanes, Tiki Wan, Eric Xu. Symbolic Regex Matcher. In TACAS 2019.
The API mostly follows that of System.Text.RegularExpressions
:
using Microsoft.SRM;
...
string input = "Hello World!";
var regex = new Regex(".l*.");
bool hasLs = regex.IsMatch(input); // True
var matches = regex.Matches(input); // list of Match structs for "ello" and "rld"
The library is built and tested with .NET Core 3.1. To build the project and run the tests run:
dotnet build
dotnet test
SRM uses unicode character tables recovered from the .NET runtime. To regenerate them for a new version of the runtime run:
cd unicode_table_gen
dotnet run ../srm/unicode