REx Parser Benchmark

This is a comparison of the performance of parsers generated by REx Parser Generator to parsers from other parser generators. The task for each parser is to parse JSON input and create a result data structure. All parsers in a test must create the same result.

Test plan

After a warm-up phase, all parsers are executed (repeatedly, if time permits) one after another for some minimum runtime on some given input. Runtime and memory usage are collected and logged.

Before the next test cycle, the input size is increased by a given factor. This is done by wrapping multiple instances of each input file's content into JSON arrays. By default, a single flat top level array is used to do this, but there is an option to nest arrays deeply, i.e. wrapping multiple instances from the previous cycle's array into a new one.

Test cycles are repeated until eventually an OutOfMemoryError will occur, or until a parser requires more than twenty times the requested parse time. The default parsing time is 10 seconds, so if a parser needs 200 seconds or more, the test will stop.

Benchmark results are dumped to these files:

throughput.csv
throughput.png
memory.csv
memory.png

Execution platforms

The benchmark covers two execution platforms:

Java - parsers generated as Java code for direct invocation from Java.
XQuery - parsers generated for use in XQuery, either generated as XQuery code, or generated as Java code for being used as an external function from XQuery, executed on BaseX 10.8 or SaxonJ-HE 12.3.

Available parsers in Java

The result of all parsers for Java is a com.fasterxml.jackson.databind.JsonNode from the Jackson project. The result object represents the parsed JSON.

These parsers are available:

Parser Name	Generator	Algorithm
`Jackson`			Reference: com.fasterxml.jackson.core.JsonParser
`HandCrafted`		recursive descent
`REx_LL`	REx 5.57	LL
`REx_LALR`	REx 5.57	LALR
`JavaCC`	JavaCC 7.0.13	LL
`ANTLR4`	ANTLR 4.13.1	LL
`Grammatica`	Grammatica 1.6	LL

Available parsers in XQuery

The result of parsers for XQuery is an XML element as it would be produced by fn:json-to-xml (see definition in XPath and XQuery 3.1 Functions and Operators).

Parser Name	Generator	Algorithm	Language	XQuery Processor
`BaseX`			Java	BaseX	Reference: `fn:json-to-xml`
`BaseXRExLL`	REx 5.57	LL	XQuery	BaseX
`BaseXRExLALR`	REx 5.57	LALR	XQuery	BaseX
`BaseXRExLLExternal`	REx 5.57	LL	Java	BaseX
`BaseXRExLALRExternal`	REx 5.57	LALR	Java	BaseX
`BaseXIxml`	Markup Blitz	GLR	Java	BaseX
`Saxon`			Java	SaxonJ-HE	Reference: `fn:json-to-xml`
`SaxonRExLL`	REx 5.57	LL	XQuery	SaxonJ-HE
`SaxonRExLALR`	REx 5.57	LALR	XQuery	SaxonJ-HE
`SaxonRExLLExternal`	REx 5.57	LL	Java	SaxonJ-HE
`SaxonRExLALRExternal`	REx 5.57	LALR	Java	SaxonJ-HE
`SaxonIxmlEarley`	CoffeeFilter	Earley	Java	SaxonJ-HE

Building rex-parser-benchmark

Use Java 11 or higher to build.

For building rex-parser-benchmark, use these commands:

git clone https://github.com/GuntherRademacher/rex-parser-benchmark.git
cd rex-parser-benchmark 
gradlew build

After the project has been built with Gradle, it can also be imported into Eclipse.

Running rex-parser-benchmark

The benchmark can be run with the run task:

gradlew run

The above command uses all defaults, it will run the Java parsers.

There are a number of command line options. These are shown when passing -? as an argument:

gradlew run "--args=-?"

This results in:

Usage: java Benchmark <OPTION>... [<FILE>|<DIRECTORY>]

  read JSON file, or all *.json files in given directory (default: current dir). Restrict
  to those that are parseable by all parsers. Parse repeatedly. Log execution time.

  Options:

  -?, --help               show this message
  --platform [java|xquery] use Java or XQuery parser set
                             (default: java)
  --exclude <PARSER>       exclude <PARSER> from test
  --include <PARSER>       include <PARSER> in test
  --novalidation           skip comparison of parsing results
  --create-result          create result JsonObject (Java only,
                             XQuery always is executed with results)
  --warmup <TIME>          warm up parsers for <TIME> seconds
                             (default: 10)
  --time <TIME>            run each parser for <TIME> seconds
                             (default: 10)
  --factor <FACTOR>        increase input size by <FACTOR>
                             after each test cycle (default 2)
  --nest                   nest JSON arrays, when increasing input size. By
                             default, a single top level array will be used.
  --heapdump <SIZE>        dump heap when reaching <SIZE> (may contain fraction
                             and unit MB or GB) to file java_<PID>.hprof.

So for running the XQuery benchmark on file src/main/resources/8KB.json, use this command:

gradlew run --args="--platform xquery src/main/resources/8KB.json"

Benchmark results - Java

Benchmark results - XQuery

License

This project is subject to the Apache 2 License.