/RegexPerf

Another bench of java's regex

Primary LanguageJavaApache License 2.0Apache-2.0

Regex bench

This project uses Code Tools: jmh to bench many regex library for java, and try to defined the fastest way to run them.

It uses ideas and code from Java Regular expression library benchmarks and Java regular expression library benchmarks – 2015. It adds some new libraries.

The joni library works on byte[] instead of a String, so I also bench different way to extract that from a String, to get the fastest. But each variation is test with joni to ensure that's it's really usefull.

Grok is a thick wrapper around java's regex, inspired by logstash.

To run it:

mvn clean package site && java -jar target/regexperf.jar

Options -f allows to make a quick run, to check code. Benchmark's name to run can be given at the commande line.

The maven reports include a version check, to ensure that all the regexes uses the latest versions.

First a test with the latest Java 8 (1.8.0_152)

$ lscpu 
CPU(s):                8
Model name:            Intel Xeon E312xx (Sandy Bridge)
CPU MHz:               2593.748

# JMH version: 1.19
# VM version: JDK 1.8.0_152, VM 25.152-b16
# VM invoker: /usr/java/jdk1.8.0_152/jre/bin/java
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 10 iterations, 30 s each
# Timeout: 88 s per iteration
# Threads: 3 threads, will synchronize iterations

Benchmark                                                       Mode  Cnt      Score    Error  Units
RegexPerf.big_org_joni_ascii_reuse                              avgt   10    132.353 ±  0.903  us/op
RegexPerf.big_java_util_regex                                   avgt   10  14222.734 ± 49.692  us/op
RegexPerf.big_io_thekraken_grok_api                             avgt   10  14566.463 ± 79.139  us/op

RegexPerf.jregex                                                avgt   10     14.245 ±  0.068  us/op
RegexPerf.org_joni_ascii_unsafe                                 avgt   10     18.225 ±  1.214  us/op
RegexPerf.org_joni_ascii_reuse                                  avgt   10     18.694 ±  1.692  us/op
RegexPerf.org_joni_ascii                                        avgt   10     19.061 ±  2.007  us/op
RegexPerf.org_joni                                              avgt   10     22.776 ±  4.679  us/op
RegexPerf.java_util_regex                                       avgt   10     58.757 ±  0.391  us/op
RegexPerf.io_thekraken_grok_api                                 avgt   10     64.270 ±  0.634  us/op
RegexPerf.org_apache_xerces_impl_xpath_regex_RegularExpression  avgt   10    150.605 ±  1.434  us/op
RegexPerf.gnu_regexp                                            avgt   10    220.395 ±  2.042  us/op
StringToBytes.getBytesAsciiUnsafe                               avgt   10     22.972 ±  0.225  ns/op
StringToBytes.getBytesAsciiReuse                                avgt   10     33.220 ±  0.229  ns/op
StringToBytes.byCharset_US_ASCII                                avgt   10     39.574 ±  0.181  ns/op
StringToBytes.getBytesAscii                                     avgt   10     43.377 ±  0.213  ns/op
StringToBytes.getBytesUTF16LE                                   avgt   10     60.789 ±  0.584  ns/op
StringToBytes.byName_US_ASCII                                   avgt   10     61.946 ±  0.555  ns/op
StringToBytes.byName_UTF_8                                      avgt   10     69.874 ±  0.636  ns/op
StringToBytes.byCharset_UTF_8                                   avgt   10     78.423 ±  0.396  ns/op
StringToBytes.byName_UTF_16                                     avgt   10    131.655 ±  1.696  ns/op
StringToBytes.byCharsetEncoder_US_ASCII                         avgt   10    266.363 ± 16.720  ns/op
StringToBytes.byCharset_UTF_16                                  avgt   10    328.958 ±  5.447  ns/op

And one with 9 (9.0.1)

# JMH version: 1.19
# VM version: JDK 9.0.1, VM 9.0.1+11
# VM invoker: /usr/java/jdk-9.0.1/bin/java
# VM options: <none>
# Warmup: 5 iterations, 1 s each
# Measurement: 10 iterations, 30 s each
# Timeout: 88 s per iteration
# Threads: 3 threads, will synchronize iterations

Benchmark                                                       Mode  Cnt     Score     Error  Units
RegexPerf.big_org_joni_ascii_reuse                              avgt   10   128.855 ±   1.509  us/op
RegexPerf.big_io_thekraken_grok_api                             avgt   10  9247.838 ± 172.971  us/op
RegexPerf.big_java_util_regex                                   avgt   10  9255.848 ± 249.751  us/op

RegexPerf.jregex                                                avgt   10    13.634 ±   0.187  us/op
RegexPerf.org_joni_ascii                                        avgt   10    20.653 ±   1.148  us/op
RegexPerf.org_joni_ascii_reuse                                  avgt   10    21.571 ±   0.850  us/op
RegexPerf.org_joni                                              avgt   10    23.943 ±   0.524  us/op
RegexPerf.java_util_regex                                       avgt   10    37.995 ±   0.756  us/op
RegexPerf.io_thekraken_grok_api                                 avgt   10    44.095 ±   0.957  us/op
RegexPerf.org_apache_xerces_impl_xpath_regex_RegularExpression  avgt   10   121.135 ±   1.173  us/op
RegexPerf.gnu_regexp                                            avgt   10   171.659 ±   2.666  us/op

StringToBytes.byCharset_US_ASCII                                avgt   10    29.405 ±   0.280  ns/op
StringToBytes.getBytesAsciiReuse                                avgt   10    32.157 ±   0.302  ns/op
StringToBytes.byName_US_ASCII                                   avgt   10    36.610 ±   0.487  ns/op
StringToBytes.getBytesAscii                                     avgt   10    40.438 ±   0.412  ns/op
StringToBytes.byCharset_UTF_8                                   avgt   10    50.512 ±   0.591  ns/op
StringToBytes.byName_UTF_8                                      avgt   10    58.353 ±   0.909  ns/op
StringToBytes.getBytesUTF16LE                                   avgt   10    59.463 ±   0.724  ns/op
StringToBytes.byCharsetEncoder_US_ASCII                         avgt   10    90.954 ±   1.782  ns/op
StringToBytes.byName_UTF_16                                     avgt   10   120.022 ±   1.346  ns/op
StringToBytes.byCharset_UTF_16                                  avgt   10   379.761 ±   4.262  ns/op

The unsafe benchmarks are gone, as the internal structure of String changed in java 9.