Performance comparison of regular expression engines
the original test environment by dark100 at
http://sljit.sourceforge.net/regex_perf.html
best view [here](https://nasciiboy.github.io/raptorVSworld/ )
Processing text or raw byte-sequence are among the common tasks performed by
most software tools. These tasks usually involve pattern matching
algorithms, and the most popular tool for such purpose is regular
expressions. Regular expressions has been evolved a lot since Kleene defined
the regular sets in the 1950’s. Today we have several widely used regular
expression engines which have different features which makes any performance
comparison a difficult task, since a faster engine is not necessary
better . Depending on the use case it might be enough to know whether a
POSIX compatible regular expression matches to a line, even the position of
the match is unneeded (grep utility ). On the contrary other use cases
require the position of capturing brackets, unicode support, conditional and
atomic block (handling a byte sequence as a single character, like ‘sch’ in
German language) support just to name a few. The former case needs a less
sophisticated algorithm, which is likely be much faster than the latter, but
again, that does not mean the former is better . More about these engine
types can be found here .
The following popular engines were choosen:
and
Before anyone jump to any conclusions, I should note the followings:
The engines were not fine tuned (because of my lack of knowledge about
their internal workings). I just compiled them with the default options. I
know enabling or disabling some features can heavily affect the results. If
you feel that you have a better configuration just drop me an e-mail and I
will update the results (mailto:nasciiboy@gmail.com ).
The regular expression engines are compiled with -O3
to allow the best performance.
This comparison page was inspired by the work of John Maddock (See his own
regex comparison here ). The input is also the same he used before:
mtent12.zip . It is a text file (e-book) which size is about 20 Mbytes.
Only common patterns are selected, they are not pathological cases nor have
any PERL specific features. The comparison was caseful.
x86-64 bit Intel Cerelon 847 1.1GHz (GCC 7.2.0, Go 1.9.1 – GNU/Linux)
Regular expression tre re2Golang re2 pcre-JIT pcre-DFA pcre onig regexp4Golang regexp4 regexp3
6486ms (19285324) 34159ms (19285324) 9072ms (19285324) 1130ms (19285324) 6677ms (19285324) 4448ms (19285324) 12876ms (19285324) 1553ms (19285324) 605ms (19285324) 1808ms (19285324)
1044ms (27084) 3166ms (27084) 134ms (27084) 55ms (27084) 68ms (27084) 65ms (27084) 127ms (27084) 1340ms (27084) 594ms (27084) 1853ms (27084)
4633ms (15451664) 32530ms (15451664) 7405ms (15451664) 931ms (15451664) 4593ms (15451664) 3086ms (15451664) 10453ms (15451664) 2056ms (15451664) 810ms (15451664) 1912ms (15451664)
2565ms (3414592) 8000ms (3414592) 1957ms (3414592) 333ms (3414592) 1619ms (3414592) 924ms (3414592) 3114ms (3414592) 1521ms (3414592) 654ms (3414592) 1059ms (3414592)
4769ms (14751878) 32506ms (14751878) 7052ms (14751878) 952ms (14751878) 4537ms (14751878) 2994ms (14751878) 10865ms (14751878) 1811ms (14750958) 819ms (14750958) 1888ms (14750958)
4762ms (14751878) 29682ms (14751878) 7066ms (14751878) 952ms (14751878) 4486ms (14751878) 2989ms (14751878) 10663ms (14751878) 2472ms (14751878) 1160ms (14751878) 3171ms (14751878)
"[a-zA-Z0-9_]" "[a-zA-Z0-9_]"
4650ms (14751878) 29437ms (14751878) 7121ms (14751878) 994ms (14751878) 4538ms (14751878) 3074ms (14751878) 10483ms (14751878) 2735ms (14751878) 1104ms (14751878) 5260ms (14751878)
2317ms (3495761) 7582ms (3495761) 2048ms (3495761) 327ms (3495761) 1551ms (3495761) 1007ms (3495761) 2956ms (3495761) 2250ms (3495761) 860ms (3495761) 2771ms (3495761)
2205ms (991813) 9541ms (3430783) 2036ms (3430783) 399ms (3430783) 1037ms (3430783) 949ms (3430783) 2676ms (3430783) 2851ms (3430783) 1363ms (3430783) 3741ms (3430783)
1591ms (314387) 5611ms (314387) 430ms (314387) 82ms (314387) 1038ms (314387) 210ms (314387) 697ms (314387) 1301ms (314387) 578ms (314387) 788ms (314387)
520ms (1781425) 2842ms (1781425) 1000ms (1781425) 139ms (1781425) 439ms (1781425) 380ms (1781425) 1434ms (1781425) 1739ms (1781425) 663ms (1781425) 1809ms (1781425)
"(((((e)))))" "<<<<<e>>>>>"
529ms (1781425) 4399ms (1781425) 1001ms (1781425) 203ms (1781425) 1269ms (1781425) 1256ms (1781425) 1971ms (1781425) 6168ms (1781425) 2360ms (1781425) 19939ms (1781425)
"((((((((((e))))))))))" "<<<<<<<<<<e>>>>>>>>>>"
525ms (1781425) 6709ms (1781425) 1006ms (1781425) 298ms (1781425) 1850ms (1781425) 1945ms (1781425) 2176ms (1781425) 12171ms (1781425) 4034ms (1781425) 58376ms (1781425)
1022ms (2388) 12ms (2388) 8ms (2388) 49ms (2388) 47ms (2388) 10ms (2388) 52ms (2388) 1610ms (2388) 589ms (2388) 2450ms (2388)
1015ms (2388) 13ms (2388) 8ms (2388) 49ms (2388) 48ms (2388) 14ms (2388) 53ms (2388) 2294ms (2388) 927ms (2388) 6027ms (2388)
1337ms (2657) 3515ms (2657) 166ms (2657) 51ms (2657) 287ms (2657) 203ms (2657) 349ms (2657) 1616ms (2657) 727ms (2657) 2575ms (2657)
"(?i:Tw)(?:(?:a?A?[i_I]{0,2})?[nN])" "(Tw)#*((a?A?[i_I]{0,2})?[nN])"
1375ms (2989) 3474ms (2989) 168ms (2989) 56ms (2989) 319ms (2989) 236ms (2989) 353ms (2989) 2154ms (2989) 1021ms (2989) 9216ms (2989)
1597ms (1877) 3614ms (1877) 277ms (1877) 47ms (1877) 2302ms (1877) 1487ms (1877) 49ms (1877) 3250ms (1877) 1267ms (1877) 5413ms (1877)
"Huck[a-zA-Z]+|Saw[a-zA-Z]+" "Huck[a-zA-Z]+|Saw[a-zA-Z]+"
1779ms (396) 4431ms (396) 126ms (396) 8ms (396) 72ms (396) 70ms (396) 117ms (396) 3894ms (396) 1456ms (396) 6599ms (396)
"[a-q][^u-z]{13}x" "[a-q][^u-z]{13}x"
4446ms (5021) 11181ms (5021) 613ms (5021) 5ms (5021) 6169ms (5021) 1742ms (5021) 135ms (5021) 9796ms (5021) 3347ms (5021) 10391ms (5021)
"Tom|Sawyer|Huckleberry|Finn" "Tom|Sawyer|Huckleberry|Finn"
3151ms (3015) 9003ms (3015) 134ms (3015) 89ms (3015) 96ms (3015) 97ms (3015) 141ms (3015) 6608ms (3015) 2608ms (3015) 10659ms (3015)
"(Tom|Sawyer|Huckleberry|Finn)" "<Tom|Sawyer|Huckleberry|Finn>"
3302ms (3015) 8876ms (3015) 131ms (3015) 87ms (3015) 100ms (3015) 102ms (3015) 142ms (3015) 7240ms (3015) 2964ms (3015) 23149ms (3015)
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]" "[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
3656ms (534) 4503ms (534) 250ms (534) 247ms (534) 889ms (534) 638ms (534) 687ms (534) 2903ms (534) 1589ms (534) 10916ms (534)
"Tom.{10,25}river|river.{10,25}Tom"
1844ms (2) 4541ms (2) 152ms (2) 44ms (2) 247ms (2) 211ms (2) 236ms (2) N/A N/A N/A
"ing[^a-zA-Z]" "ing[^a-zA-Z]"
1149ms (85956) 228ms (85956) 111ms (85956) 54ms (85956) 243ms (85956) 140ms (85956) 140ms (85956) 1611ms (85956) 627ms (85956) 3131ms (85956)
"[a-zA-Z]ing[^a-zA-Z]" "[a-zA-Z]ing[^a-zA-Z]"
1859ms (85823) 3884ms (85823) 309ms (85823) 56ms (85823) 2376ms (85823) 1549ms (85823) 143ms (85823) 3274ms (85823) 1328ms (85823) 6761ms (85823)
2070ms (95863) 5411ms (95863) 318ms (95863) 213ms (95863) 6616ms (95863) 4174ms (95863) 2603ms (95863) N/A N/A N/A
x86-64 bit AMD A6-9500 3.8GHz (GCC 7.2.1, Go 1.9.1 – GNU/Linux)
Regular expression tre re2Golang re2 pcre-JIT pcre-DFA pcre onig regexp4Golang regexp4 regexp3
2039ms (19285324) 12477ms (19285324) 3361ms (19285324) 594ms (19285324) 1673ms (19285324) 1379ms (19285324) 4188ms (19285324) 559ms (19285324) 190ms (19285324) 660ms (19285324)
347ms (27084) 1064ms (27084) 46ms (27084) 16ms (27084) 19ms (27084) 20ms (27084) 43ms (27084) 478ms (27084) 253ms (27084) 669ms (27084)
1468ms (15451664) 9661ms (15451664) 2779ms (15451664) 339ms (15451664) 1040ms (15451664) 964ms (15451664) 3649ms (15451664) 721ms (15451664) 287ms (15451664) 709ms (15451664)
908ms (3414592) 2793ms (3414592) 728ms (3414592) 113ms (3414592) 501ms (3414592) 303ms (3414592) 943ms (3414592) 531ms (3414592) 200ms (3414592) 368ms (3414592)
1483ms (14751878) 9036ms (14751878) 2637ms (14751878) 345ms (14751878) 1001ms (14751878) 927ms (14751878) 3387ms (14751878) 637ms (14750958) 305ms (14750958) 693ms (14750958)
1479ms (14751878) 9049ms (14751878) 2627ms (14751878) 345ms (14751878) 1003ms (14751878) 920ms (14751878) 3364ms (14751878) 849ms (14751878) 433ms (14751878) 1190ms (14751878)
"[a-zA-Z0-9_]" "[a-zA-Z0-9_]"
1476ms (14751878) 9898ms (14751878) 2636ms (14751878) 346ms (14751878) 1016ms (14751878) 966ms (14751878) 3384ms (14751878) 920ms (14751878) 390ms (14751878) 1953ms (14751878)
782ms (3495761) 3242ms (3495761) 744ms (3495761) 116ms (3495761) 491ms (3495761) 292ms (3495761) 977ms (3495761) 750ms (3495761) 301ms (3495761) 927ms (3495761)
658ms (991813) 3080ms (3430783) 684ms (3430783) 117ms (3430783) 329ms (3430783) 276ms (3430783) 865ms (3430783) 955ms (3430783) 484ms (3430783) 1338ms (3430783)
562ms (314387) 1909ms (314387) 149ms (314387) 32ms (314387) 342ms (314387) 64ms (314387) 234ms (314387) 433ms (314387) 174ms (314387) 246ms (314387)
166ms (1781425) 952ms (1781425) 339ms (1781425) 54ms (1781425) 149ms (1781425) 120ms (1781425) 440ms (1781425) 581ms (1781425) 231ms (1781425) 679ms (1781425)
"(((((e)))))" "<<<<<e>>>>>"
166ms (1781425) 1472ms (1781425) 347ms (1781425) 94ms (1781425) 433ms (1781425) 381ms (1781425) 540ms (1781425) 2126ms (1781425) 824ms (1781425) 7649ms (1781425)
"((((((((((e))))))))))" "<<<<<<<<<<e>>>>>>>>>>"
166ms (1781425) 2267ms (1781425) 342ms (1781425) 136ms (1781425) 841ms (1781425) 617ms (1781425) 678ms (1781425) 3981ms (1781425) 1465ms (1781425) 24102ms (1781425)
334ms (2388) 5ms (2388) 3ms (2388) 12ms (2388) 16ms (2388) 4ms (2388) 13ms (2388) 559ms (2388) 211ms (2388) 1030ms (2388)
333ms (2388) 5ms (2388) 3ms (2388) 12ms (2388) 16ms (2388) 5ms (2388) 14ms (2388) 781ms (2388) 336ms (2388) 2506ms (2388)
445ms (2657) 1204ms (2657) 65ms (2657) 13ms (2657) 96ms (2657) 71ms (2657) 117ms (2657) 543ms (2657) 246ms (2657) 1061ms (2657)
"(?i:Tw)(?:(?:a?A?[i_I]{0,2})?[nN])" "(Tw)#*((a?A?[i_I]{0,2})?[nN])"
458ms (2989) 1191ms (2989) 68ms (2989) 25ms (2989) 102ms (2989) 80ms (2989) 113ms (2989) 729ms (2989) 334ms (2989) 3617ms (2989)
524ms (1877) 1320ms (1877) 90ms (1877) 11ms (1877) 750ms (1877) 498ms (1877) 12ms (1877) 1073ms (1877) 436ms (1877) 1837ms (1877)
"Huck[a-zA-Z]+|Saw[a-zA-Z]+" "Huck[a-zA-Z]+|Saw[a-zA-Z]+"
548ms (396) 1511ms (396) 42ms (396) 4ms (396) 23ms (396) 23ms (396) 39ms (396) 1271ms (396) 494ms (396) 2483ms (396)
"[a-q][^u-z]{13}x" "[a-q][^u-z]{13}x"
1467ms (5021) 3540ms (5021) 359ms (5021) 3ms (5021) 1985ms (5021) 588ms (5021) 40ms (5021) 3306ms (5021) 1190ms (5021) 3356ms (5021)
"Tom|Sawyer|Huckleberry|Finn" "Tom|Sawyer|Huckleberry|Finn"
931ms (3015) 2794ms (3015) 44ms (3015) 24ms (3015) 30ms (3015) 31ms (3015) 47ms (3015) 2295ms (3015) 864ms (3015) 4132ms (3015)
"(Tom|Sawyer|Huckleberry|Finn)" "<Tom|Sawyer|Huckleberry|Finn>"
937ms (3015) 2956ms (3015) 44ms (3015) 26ms (3015) 31ms (3015) 33ms (3015) 47ms (3015) 2511ms (3015) 1011ms (3015) 9374ms (3015)
"[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]" "[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]"
1104ms (534) 1526ms (534) 90ms (534) 81ms (534) 288ms (534) 203ms (534) 227ms (534) 954ms (534) 594ms (534) 3782ms (534)
"Tom.{10,25}river|river.{10,25}Tom"
604ms (2) 1520ms (2) 57ms (2) 20ms (2) 76ms (2) 71ms (2) 76ms (2) N/A N/A N/A
"ing[^a-zA-Z]" "ing[^a-zA-Z]"
379ms (85956) 76ms (85956) 41ms (85956) 23ms (85956) 79ms (85956) 48ms (85956) 39ms (85956) 592ms (85956) 222ms (85956) 1255ms (85956)
"[a-zA-Z]ing[^a-zA-Z]" "[a-zA-Z]ing[^a-zA-Z]"
639ms (85823) 1341ms (85823) 103ms (85823) 24ms (85823) 767ms (85823) 485ms (85823) 39ms (85823) 1143ms (85823) 461ms (85823) 2355ms (85823)
717ms (95863) 1903ms (95863) 106ms (95863) 75ms (95863) 1766ms (95863) 1424ms (95863) 925ms (95863) N/A N/A N/A
$ wget http://www.gutenberg.org/files/3200/old/mtent12.zip
$ dos2unix mtent12.txt data.txt
$ dos2unix data.txt
$ rm mtent12*
$ ./rebuild.sh
$ ./rebench.sh
$ ./reclean.sh