This repository contains the code for a Neo4j unmanaged extension demonstrating a possible performance regression in the traversals between Neo4j 2.2 and 2.3, probably related to the removal of the object cache.
-
Clone this repository
-
Build it locally using Maven and a 1.7 JDK:
mvn clean package
-
Copy the resulting artifact to the
plugins
directory in your Neo4j installation:cp target/traversal-perfs-1.0-SNAPSHOT.jar /path/to/neo4j/plugins/
-
Configure Neo4j to load the unmanage extension by adding in
conf/neo4j-server.properties
:org.neo4j.server.thirdparty_jaxrs_classes=com.ekino.neo4j.traversal=/traversal-perfs
-
Start your Neo4j instance, if possible with an empty database:
bin/neo4j start
-
Populate the database with a traversable tree.
curl localhost:7474/traversal-perfs/populate
The tree consists of a root node (with the
:Root
and:A
labels), connected to:B
nodes through aHAS_B
relationship, connected to more:A
nodes through aHAS_A
relationship. By default, the tree has a depth of 5 such "levels", and a fanout factor of 4 (the number of children nodes for a parent node), i.e. 1398101 nodes. Those parameters can be modified using query parameters, "depth" and "fanout":curl localhost:7474/traversal-perfs/populate?depth=8&fanout=2
The database needs to be emptied before a new tree can be generated.
-
Measure the response time.
# Default breadth-first traversal mode ./run.sh # or explicit depth-first traversal mode ./run.sh --depth-first
The code only has 5 source files, and the most relevant part is
TrueBNodesCounter
which
implements a traversal using custom Evaluator
and PathExpander
, based on the core Neo4j API: Node::hasLabel
,
Node::getProperty
, Node::getRelationships
.
Profiling (using Yourkit) shows that the cost of all these API calls has increased between 2.2 and 2.3, and that more garbage is generated by a request.
The times are in milliseconds.
Version | Neo4j Enterprise 2.2.7 | Neo4j Enterprise 2.3.2 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Traversal mode | Breadth-First | Depth-First | Breadth-First | Depth-First | ||||||||
Heap (MB) | 1024 | 2048 | 4096 | 1024 | 2048 | 4096 | 1024 | 2048 | 4096 | 1024 | 2048 | 4096 |
Mean | 6616 | 3441 | 2045 | 2229 | 2209 | 1276 | 2771 | 2500 | 2465 | 1544 | 1543 | 1540 |
50% | 6308 | 3281 | 2022 | 2228 | 2208 | 1277 | 2748 | 2487 | 2443 | 1549 | 1517 | 1547 |
90% | 8520 | 3823 | 2296 | 2406 | 2415 | 1391 | 2981 | 2672 | 2674 | 1679 | 1690 | 1693 |
95% | 8820 | 5268 | 2353 | 2454 | 2488 | 1424 | 3145 | 2773 | 2709 | 1711 | 1727 | 1716 |
Max | 10591 | 5807 | 2856 | 2485 | 2795 | 1513 | 3722 | 3321 | 2936 | 1882 | 1771 | 1757 |
After some discussion on Slack with Michael Hunger (from Neo Technology) who came up with different results in a slightly different context, it turns out the benchmark between Neo4j 2.2 and 2.3 was not a fair comparison, as it was comparing apples and oranges: because the same traversal is called repeatedly (i.e. a synthetic benchmark) and the dataset fits in memory, the 2.2 run is not actually benchmarking a raw traversal but a traversal with lots of cache (using the node and relationship caches). In that perfect scenario, the repeated traversal is indeed faster.
For a fairer comparison, I've added a new service to clear the caches, then re-run the best-case scenario for 2.2 clearing the cache between each traversal using a new switch in the custom script:
./run.sh --depth-first --clear
The new results show that with an empty object cache, Neo4j 2.2 performs much worse than 2.3. So, is 2.3 actually better
or worse than 2.2? It depends... It's way better when the cache is cold, but worse when it's warm and everything fits
in. Since there's no going back, the increased cost of some operations (e.g. Node::hasLabel
) will have to be mitigated
at the application level by locally introducing some cache.
The times are in milliseconds.
Version | Neo4j Enterprise 2.2.7, Depth-First | Neo4j Enterprise 2.3.2, Depth-First | |
---|---|---|---|
Heap (MB) | 4096 | 4096 w/ cache clear |
4096 |
Mean | 1276 | 2938 | 1540 |
50% | 1277 | 2889 | 1547 |
90% | 1391 | 3178 | 1693 |
95% | 1424 | 3238 | 1716 |
Max | 1513 | 3642 | 1757 |
It's possible to cache the results of Node::hasLabel
or Node::getProperty
between queries to reduce
the I/O and memory consumption, even if the page cache helps:
./run.sh --depth-first --cache label
# or
./run.sh --depth-first --cache property
# or
./run.sh --depth-first --cache label,property
Obviously, this increases the memory footprint of the application (working set), and will have an effect on the duration of the garbage collection, but because it reduces the I/O and thus its use of buffers, it decreases the garbage generated per request (i.e. the memory consumed by the request). The test implementations of the caches are unbounded, and effectively duplicate part of the database in memory.
The times are in milliseconds.
Version | Neo4j Enterprise 2.3.2, Depth-First | |||
---|---|---|---|---|
Heap (MB) | 4096 | 4096 w/ label cache |
4096 w/ property cache |
4096 w/ label & property caches |
Mean | 1540 | 1209 | 1387 | 1067 |
50% | 1547 | 1207 | 1384 | 1053 |
90% | 1693 | 1229 | 1427 | 1100 |
95% | 1716 | 1241 | 1432 | 1192 |
Max | 1757 | 1284 | 1447 | 1248 |
Working set (MB) | 24 | 42 | 40 | 58 |
Garbage per request (MB) | 76 | 46 | 73 | 43 |
MacBook Pro Core i7 2.2 GHz, 16 GB
Oracle 64-bit JDK 1.8.0_74
data/graph.db
takes 402 MB.
dbms.pagecache.memory=500m
Licensed under the Apache License, Version 2.0