Here you can find the main implementation of REmatch library in C++, and its bindings for both Python and JavaScript/WebAssembly. This version has been refactorized, tested, and developed for being ready for production.
/src
: The C++ implementation./tests
: Contains all automatic tests for the code./scripts
: Contains useful scripts for setting up the project, including compilation and other tasks.
Dependencies:
- Clang version 11 or newer
- CMake version 3.23.2 or newer
- Catch2 (required for tests only)
The setup of CMake, Catch2 and Clang on a clean ubuntu system can be done using the following script:
./scripts/setup/ubuntu/setup_cpp_to_execute_rematch.sh
The target file is rematch.cpp
, which should be located in the src/targets/main/
directory. To build, run:
./scripts/compile.sh
After building, the binary file will be located in the build/Release/bin
folder. To try it simply run:
./build/Release/bin/rematch
If you want to use a debugger such as gdb
, then you should add -DCMAKE_BUILD_TYPE=Debug
in the first CMake command.
To use REmatch, you have two options. You can create a Query
object through the method REMatch::compile
and pass the regular expression as argument, or you can directly call the functions provided.
// Compile a regular expression using the compile method and find a match
REMatch::Query query = REMatch::compile("!x{aba}");
std::unique_ptr<REMatch::Match> query_match = query.findone("aabaa");
// Equivalent to calling findone directly
std::unique_ptr<REMatch::Match> direct_match = REMatch::findone("!x{aba}", "aabaa");
The Query
provides the methods findone
and finditer
that evaluate a document and return the matches found. The findone
method returns a pointer to the first encountered match, while finditer
returns an iterator that allows you to access all matches. To retrieve all the matches at once, you can use the findall
method. You can use the start
and end
methods to obtain the indices of the matched spans or span
to get a string representation.
#include <memory>
#include <string>
#include "library_interface/rematch.hpp"
int main() {
std::string document = "abaababa";
REMatch::Query query = REMatch::compile("!x{aba}");
std::unique_ptr<REMatch::MatchIterator> iterator = query.finditer(document);
std::unique_ptr<REMatch::Match> match = iterator->next();
while (match != nullptr) {
std::cout << "Span: [" << match->start("x") << ", " << match->end("x") << ">" << std::endl;
match = iterator->next();
}
return 0;
}
#include <iostream>
#include <string>
#include "library_interface/rematch.hpp"
int main() {
std::string document = "abaababa";
REMatch::Query query = REMatch::compile("!x{aba}");
std::unique_ptr<REMatch::Match> match = query.findone(document);
std::cout << "Span: " << match->span("x") << std::endl;
return 0;
}
#include <iostream>
#include <string>
#include "library_interface/rematch.hpp"
int main() {
std::string document = "aba";
std::string query = "!x{aba}";
std::vector<REMatch::Match> matches = REMatch::library_interface::findall(query, document);
for (REMatch::Match& match: matches) {
std::cout << "Span: [" << match.start("x") << ", " << match.end("x") << ">" << std::endl;
}
return 0;
}
We are using Catch2 for unit testing.
To add more tests, create files of the form: tests/<module_name>/<class_tested>.cpp
and add these files to the TEST_SOURCES of tests/CMakeLists.txt
.
To build the automatic documentation two packages are needed: graphviz and doxygen. To build the documentation run doxygen DoxyFile
.
We are using the Tracy profiler to measure the time spent in specific code segments. To profile the code using the graphical user interface, follow these steps:
- Set the
ENABLE_PROFILING
flag to ON in the CMakeLists.txt file. - Compile the code with the updated CMakeLists.
- Navigate to
build/Debug/_deps/tracy-src/profiler/build/unix
and runmake
to generate the "Tracy-release" executable. - Execute the "Tracy-release" executable.
- Run REmatch to initiate the profiling.
You should be able to view the results in the graphical interface.
This implementation is based on the paper REmatch: a novel regex engine for finding all matches by Cristian Riveros, Nicolás Van Sint Jan, and Domagoj Vrgoč.
To begin, follow the official installation tutorial for Emscripten available at https://emscripten.org/docs/getting_started/downloads.html
Once the installation is complete, execute the following command:
./scripts/compile_emscripten.sh
To install the release version from PyPI, run:
pip install pyrematch
To build from the source code, clone this repository and run:
pip install .