/rex-parser-benchmark

Performance comparison of REx-generated parsers to those of other parser generators.

Primary LanguageJavaApache License 2.0Apache-2.0

REx Parser Benchmark

This is a comparison of the performance of parsers generated by REx Parser Generator to parsers from other parser generators. The task for each parser is to parse JSON input and create a result data structure. All parsers in a test must create the same result.

Test plan

After a warm-up phase, all parsers are executed (repeatedly, if time permits) one after another for some minimum runtime on some given input. Runtime and memory usage are collected and logged.

Before the next test cycle, the input size is increased by a given factor. This is done by wrapping multiple instances of each input file's content into JSON arrays. By default, a single flat top level array is used to do this, but there is an option to nest arrays deeply, i.e. wrapping multiple instances from the previous cycle's array into a new one.

Test cycles are repeated until eventually an OutOfMemoryError will occur, or until a parser requires more than twenty times the requested parse time. The default parsing time is 10 seconds, so if a parser needs 200 seconds or more, the test will stop.

Benchmark results are dumped to these files:

  • throughput.csv
  • throughput.png
  • memory.csv
  • memory.png

Execution platforms

The benchmark covers two execution platforms:

  • Java - parsers generated as Java code for direct invocation from Java.
  • XQuery - parsers generated for use in XQuery, either generated as XQuery code, or generated as Java code for being used as an external function from XQuery, executed on BaseX 10.8 or SaxonJ-HE 12.3.

Available parsers in Java

The result of all parsers for Java is a com.fasterxml.jackson.databind.JsonNode from the Jackson project. The result object represents the parsed JSON.

These parsers are available:

Parser Name Generator Algorithm
Jackson Reference: com.fasterxml.jackson.core.JsonParser
HandCrafted recursive descent
REx_LL REx 5.57 LL
REx_LALR REx 5.57 LALR
JavaCC JavaCC 7.0.13 LL
ANTLR4 ANTLR 4.13.1 LL
Grammatica Grammatica 1.6 LL

Available parsers in XQuery

The result of parsers for XQuery is an XML element as it would be produced by fn:json-to-xml (see definition in XPath and XQuery 3.1 Functions and Operators).

Parser Name Generator Algorithm Language XQuery Processor
BaseX Java BaseX Reference: fn:json-to-xml
BaseXRExLL REx 5.57 LL XQuery BaseX
BaseXRExLALR REx 5.57 LALR XQuery BaseX
BaseXRExLLExternal REx 5.57 LL Java BaseX
BaseXRExLALRExternal REx 5.57 LALR Java BaseX
BaseXIxml Markup Blitz GLR Java BaseX
Saxon Java SaxonJ-HE Reference: fn:json-to-xml
SaxonRExLL REx 5.57 LL XQuery SaxonJ-HE
SaxonRExLALR REx 5.57 LALR XQuery SaxonJ-HE
SaxonRExLLExternal REx 5.57 LL Java SaxonJ-HE
SaxonRExLALRExternal REx 5.57 LALR Java SaxonJ-HE
SaxonIxmlEarley CoffeeFilter Earley Java SaxonJ-HE

Building rex-parser-benchmark

Use Java 11 or higher to build.

For building rex-parser-benchmark, use these commands:

git clone https://github.com/GuntherRademacher/rex-parser-benchmark.git
cd rex-parser-benchmark 
gradlew build

After the project has been built with Gradle, it can also be imported into Eclipse.

Running rex-parser-benchmark

The benchmark can be run with the run task:

gradlew run

The above command uses all defaults, it will run the Java parsers.

There are a number of command line options. These are shown when passing -? as an argument:

gradlew run "--args=-?"

This results in:

Usage: java Benchmark <OPTION>... [<FILE>|<DIRECTORY>]

  read JSON file, or all *.json files in given directory (default: current dir). Restrict
  to those that are parseable by all parsers. Parse repeatedly. Log execution time.

  Options:

  -?, --help               show this message
  --platform [java|xquery] use Java or XQuery parser set
                             (default: java)
  --exclude <PARSER>       exclude <PARSER> from test
  --include <PARSER>       include <PARSER> in test
  --novalidation           skip comparison of parsing results
  --create-result          create result JsonObject (Java only,
                             XQuery always is executed with results)
  --warmup <TIME>          warm up parsers for <TIME> seconds
                             (default: 10)
  --time <TIME>            run each parser for <TIME> seconds
                             (default: 10)
  --factor <FACTOR>        increase input size by <FACTOR>
                             after each test cycle (default 2)
  --nest                   nest JSON arrays, when increasing input size. By
                             default, a single top level array will be used.
  --heapdump <SIZE>        dump heap when reaching <SIZE> (may contain fraction
                             and unit MB or GB) to file java_<PID>.hprof.

So for running the XQuery benchmark on file src/main/resources/8KB.json, use this command:

gradlew run --args="--platform xquery src/main/resources/8KB.json"

Benchmark results - Java

throughput-java

memory-java

Benchmark results - XQuery

throughput-xquery

memory-xquery

License

This project is subject to the Apache 2 License.