Neo4j Traversals

This repository contains the code for a Neo4j unmanaged extension demonstrating a possible performance regression in the traversals between Neo4j 2.2 and 2.3, probably related to the removal of the object cache.

Installation

Clone this repository
Build it locally using Maven and a 1.7 JDK:
```
 mvn clean package
```
Copy the resulting artifact to the plugins directory in your Neo4j installation:
```
 cp target/traversal-perfs-1.0-SNAPSHOT.jar /path/to/neo4j/plugins/
```

Configure Neo4j to load the unmanage extension by adding in conf/neo4j-server.properties:

 org.neo4j.server.thirdparty_jaxrs_classes=com.ekino.neo4j.traversal=/traversal-perfs

Usage

Start your Neo4j instance, if possible with an empty database:
```
 bin/neo4j start
```
Populate the database with a traversable tree.
```
 curl localhost:7474/traversal-perfs/populate
```
The tree consists of a root node (with the :Root and :A labels), connected to :B nodes through a HAS_B relationship, connected to more :A nodes through a HAS_A relationship. By default, the tree has a depth of 5 such "levels", and a fanout factor of 4 (the number of children nodes for a parent node), i.e. 1398101 nodes. Those parameters can be modified using query parameters, "depth" and "fanout":
```
 curl localhost:7474/traversal-perfs/populate?depth=8&fanout=2
```
The database needs to be emptied before a new tree can be generated.

Measure the response time.

 # Default breadth-first traversal mode
 ./run.sh
 # or explicit depth-first traversal mode
 ./run.sh --depth-first

Traversal implementation

The code only has 5 source files, and the most relevant part is TrueBNodesCounter which implements a traversal using custom Evaluator and PathExpander, based on the core Neo4j API: Node::hasLabel, Node::getProperty, Node::getRelationships.

Profiling (using Yourkit) shows that the cost of all these API calls has increased between 2.2 and 2.3, and that more garbage is generated by a request.

Results

The times are in milliseconds.

Version	Neo4j Enterprise 2.2.7						Neo4j Enterprise 2.3.2
Traversal mode	Breadth-First			Depth-First			Breadth-First			Depth-First
Heap (MB)	1024	2048	4096	1024	2048	4096	1024	2048	4096	1024	2048	4096
Mean	6616	3441	2045	2229	2209	1276	2771	2500	2465	1544	1543	1540
50%	6308	3281	2022	2228	2208	1277	2748	2487	2443	1549	1517	1547
90%	8520	3823	2296	2406	2415	1391	2981	2672	2674	1679	1690	1693
95%	8820	5268	2353	2454	2488	1424	3145	2773	2709	1711	1727	1716
Max	10591	5807	2856	2485	2795	1513	3722	3321	2936	1882	1771	1757

Benchmarking pitfalls

After some discussion on Slack with Michael Hunger (from Neo Technology) who came up with different results in a slightly different context, it turns out the benchmark between Neo4j 2.2 and 2.3 was not a fair comparison, as it was comparing apples and oranges: because the same traversal is called repeatedly (i.e. a synthetic benchmark) and the dataset fits in memory, the 2.2 run is not actually benchmarking a raw traversal but a traversal with lots of cache (using the node and relationship caches). In that perfect scenario, the repeated traversal is indeed faster.

For a fairer comparison, I've added a new service to clear the caches, then re-run the best-case scenario for 2.2 clearing the cache between each traversal using a new switch in the custom script:

./run.sh --depth-first --clear

The new results show that with an empty object cache, Neo4j 2.2 performs much worse than 2.3. So, is 2.3 actually better or worse than 2.2? It depends... It's way better when the cache is cold, but worse when it's warm and everything fits in. Since there's no going back, the increased cost of some operations (e.g. Node::hasLabel) will have to be mitigated at the application level by locally introducing some cache.

Results

The times are in milliseconds.

Version	Neo4j Enterprise 2.2.7, Depth-First		Neo4j Enterprise 2.3.2, Depth-First
Heap (MB)	4096	4096 w/ cache clear	4096
Mean	1276	2938	1540
50%	1277	2889	1547
90%	1391	3178	1693
95%	1424	3238	1716
Max	1513	3642	1757

Caching by yourself

It's possible to cache the results of Node::hasLabel or Node::getProperty between queries to reduce the I/O and memory consumption, even if the page cache helps:

./run.sh --depth-first --cache label
# or
./run.sh --depth-first --cache property
# or
./run.sh --depth-first --cache label,property

Obviously, this increases the memory footprint of the application (working set), and will have an effect on the duration of the garbage collection, but because it reduces the I/O and thus its use of buffers, it decreases the garbage generated per request (i.e. the memory consumed by the request). The test implementations of the caches are unbounded, and effectively duplicate part of the database in memory.

Results

The times are in milliseconds.

Version	Neo4j Enterprise 2.3.2, Depth-First
Heap (MB)	4096	4096 w/ label cache	4096 w/ property cache	4096 w/ label & property caches
Mean	1540	1209	1387	1067
50%	1547	1207	1384	1053
90%	1693	1229	1427	1100
95%	1716	1241	1432	1192
Max	1757	1284	1447	1248
Working set (MB)	24	42	40	58
Garbage per request (MB)	76	46	73	43

Configurations

MacBook Pro Core i7 2.2 GHz, 16 GB

Oracle 64-bit JDK 1.8.0_74

data/graph.db takes 402 MB.

dbms.pagecache.memory=500m

Licensed under the Apache License, Version 2.0

fpavageau/traversal-perfs

Neo4j Traversals

Installation

Usage

Traversal implementation

Results

Benchmarking pitfalls

Results

Caching by yourself

Results

Configurations