/traversal-perfs

Primary LanguageJavaApache License 2.0Apache-2.0

Neo4j Traversals

This repository contains the code for a Neo4j unmanaged extension demonstrating a possible performance regression in the traversals between Neo4j 2.2 and 2.3, probably related to the removal of the object cache.

Installation

  1. Clone this repository

  2. Build it locally using Maven and a 1.7 JDK:

     mvn clean package
    
  3. Copy the resulting artifact to the plugins directory in your Neo4j installation:

     cp target/traversal-perfs-1.0-SNAPSHOT.jar /path/to/neo4j/plugins/
    
  4. Configure Neo4j to load the unmanage extension by adding in conf/neo4j-server.properties:

     org.neo4j.server.thirdparty_jaxrs_classes=com.ekino.neo4j.traversal=/traversal-perfs
    

Usage

  1. Start your Neo4j instance, if possible with an empty database:

     bin/neo4j start
    
  2. Populate the database with a traversable tree.

     curl localhost:7474/traversal-perfs/populate
    

    The tree consists of a root node (with the :Root and :A labels), connected to :B nodes through a HAS_B relationship, connected to more :A nodes through a HAS_A relationship. By default, the tree has a depth of 5 such "levels", and a fanout factor of 4 (the number of children nodes for a parent node), i.e. 1398101 nodes. Those parameters can be modified using query parameters, "depth" and "fanout":

     curl localhost:7474/traversal-perfs/populate?depth=8&fanout=2
    

    The database needs to be emptied before a new tree can be generated.

  3. Measure the response time.

     # Default breadth-first traversal mode
     ./run.sh
     # or explicit depth-first traversal mode
     ./run.sh --depth-first
    

Traversal implementation

The code only has 5 source files, and the most relevant part is TrueBNodesCounter which implements a traversal using custom Evaluator and PathExpander, based on the core Neo4j API: Node::hasLabel, Node::getProperty, Node::getRelationships.

Profiling (using Yourkit) shows that the cost of all these API calls has increased between 2.2 and 2.3, and that more garbage is generated by a request.

Results

The times are in milliseconds.

Version Neo4j Enterprise 2.2.7 Neo4j Enterprise 2.3.2
Traversal mode Breadth-First Depth-First Breadth-First Depth-First
Heap (MB) 1024 2048 4096 1024 2048 4096 1024 2048 4096 1024 2048 4096
Mean 6616 3441 2045 2229 2209 1276 2771 2500 2465 1544 1543 1540
50% 6308 3281 2022 2228 2208 1277 2748 2487 2443 1549 1517 1547
90% 8520 3823 2296 2406 2415 1391 2981 2672 2674 1679 1690 1693
95% 8820 5268 2353 2454 2488 1424 3145 2773 2709 1711 1727 1716
Max 10591 5807 2856 2485 2795 1513 3722 3321 2936 1882 1771 1757

Benchmarking pitfalls

After some discussion on Slack with Michael Hunger (from Neo Technology) who came up with different results in a slightly different context, it turns out the benchmark between Neo4j 2.2 and 2.3 was not a fair comparison, as it was comparing apples and oranges: because the same traversal is called repeatedly (i.e. a synthetic benchmark) and the dataset fits in memory, the 2.2 run is not actually benchmarking a raw traversal but a traversal with lots of cache (using the node and relationship caches). In that perfect scenario, the repeated traversal is indeed faster.

For a fairer comparison, I've added a new service to clear the caches, then re-run the best-case scenario for 2.2 clearing the cache between each traversal using a new switch in the custom script:

./run.sh --depth-first --clear

The new results show that with an empty object cache, Neo4j 2.2 performs much worse than 2.3. So, is 2.3 actually better or worse than 2.2? It depends... It's way better when the cache is cold, but worse when it's warm and everything fits in. Since there's no going back, the increased cost of some operations (e.g. Node::hasLabel) will have to be mitigated at the application level by locally introducing some cache.

Results

The times are in milliseconds.

Version Neo4j Enterprise 2.2.7, Depth-First Neo4j Enterprise 2.3.2, Depth-First
Heap (MB) 4096 4096
w/ cache clear
4096
Mean 1276 2938 1540
50% 1277 2889 1547
90% 1391 3178 1693
95% 1424 3238 1716
Max 1513 3642 1757

Caching by yourself

It's possible to cache the results of Node::hasLabel or Node::getProperty between queries to reduce the I/O and memory consumption, even if the page cache helps:

./run.sh --depth-first --cache label
# or
./run.sh --depth-first --cache property
# or
./run.sh --depth-first --cache label,property

Obviously, this increases the memory footprint of the application (working set), and will have an effect on the duration of the garbage collection, but because it reduces the I/O and thus its use of buffers, it decreases the garbage generated per request (i.e. the memory consumed by the request). The test implementations of the caches are unbounded, and effectively duplicate part of the database in memory.

Results

The times are in milliseconds.

Version Neo4j Enterprise 2.3.2, Depth-First
Heap (MB) 4096 4096
w/ label cache
4096
w/ property cache
4096
w/ label & property caches
Mean 1540 1209 1387 1067
50% 1547 1207 1384 1053
90% 1693 1229 1427 1100
95% 1716 1241 1432 1192
Max 1757 1284 1447 1248
Working set (MB) 24 42 40 58
Garbage per request (MB) 76 46 73 43

Configurations

MacBook Pro Core i7 2.2 GHz, 16 GB

Oracle 64-bit JDK 1.8.0_74

data/graph.db takes 402 MB.

dbms.pagecache.memory=500m

Licensed under the Apache License, Version 2.0