/parabix

Primary LanguageC++

Parabix LLVM

This repository contains the implementation for the thesis "High-Performance Regular Expression Matching with Parabix and LLVM" which can also be found here.

This project was done a part of TUM Database Implementation practical course.

Implementation

This repository contains both iterative and LLVM codegen approaches for Parabix, they are located at parabix_cpp and parabix_llvm relatively.

You may also want to check the Parabix compiler (parabix_compiler.cc) that generates a code by LLVM IRBuilder API.

Presentation

You can find the PDF document here used during the presentation.

Benchmark

Relative files are generator and benchmark.

size/algo std::regex parabix-ccp parabix-llvm
10MB 0.22 0.12 0.016
100MB 2.2 1.2 0.12
500MB 11 6 0.6
1GB 23 13 1.2
1.2GB 25 15 1.4

NOTE: Time to read input data from a file is excluded from the elapsed times. The pattern is a[0-9]*z.

Wanna try?

mkdir build
cd build
cmake ../
# generate input file
ninja generator
./generator 1000 ../1gb.txt
# run benchmark
ninja benchmark
./benchmark
# run vgrep
ninja vgrep_llvm
./vgrep_llvm ../1gb.txt "a[0-9]*z"