Neo4j Benchmark with Pokec Data Set Data Set Available On : https://snap.stanford.edu/data/soc-Pokec.html Dataset is not available in this repository because the size is huge. -------------------------------------- Pokec Social Network Benchmark Dataset Hardware & Major environment --------------------------------------- Model Name: MacBook Pro Model Identifier: MacBookPro11,3 Processor Name: Intel Core i7 Processor Speed: 2,3 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 6 MB Hyper-Threading Technology: Enabled Memory: 16 GB Java build 1.8.0_191 Following python modules are required sudo pip install tornado sudo pip install neo4j-driver sudo pip install requests Install Neo4j --------------------------------------- Make sure you have already install Neo4j and arrange your system paths to use Neo4j commands through the terminal. Comfigure Neo4j memory $NEO4J_HOME/bin/neo4j-admin memrec For this benchmark, I use the initial memory configuration of Neo4j. In order to warm up the cache before executing any queries, first make sure you have called "CALL apoc.warmup.run()" command from cypher-shell. cypher-shell came with Neo4j and you can execute this command after arranging your $NEO4J_HOME. Append the 3 lines to $NEO4J_HOME/conf/neo4j.conf Move apoc-3.5.0.2-all.jar to $NEO4J_HOME/plugins/ mv apoc-3.5.0.2-all.jar $NEO4J_HOME/plugins/ Append following lines $NEO4J_HOME/conf/neo4j.conf/neo4j.conf dbms.security.procedures.unrestricted=apoc.* Start server $NEO4J_HOME/bin/neo4j start Stop server $NEO4J_HOME/bin/neo4j stop Create/change username and password cypher-shell, then exit $NEO4J_HOME/bin/cypher-shell user:neo4j pass:neo4j #change password neo4j> CALL dbms.changePassword('benchmark') #exit shell neo4j> :exit #log in again $NEO4J_HOME/bin/cypher-shell -u neo4j -p benchmark Bulk Loading Raw Data ---------------------------------------- I load all nodes and relationships with Neo4j Import API. If you want to load all raw data and create your graph database with this API, you can view /queryscripts/import-data-from-csv.sh file. If you want to reload all raw data again, you can simply execute ./load_scripts/load-in-one-step.sh file. This script delete all graph.db data, stop the Neo4j server, import again and start Neo4j again in one step. Statistics While Loading All Raw Data # These statistics come directly from console log. -------------------------------------------------------- Available resources: Total machine memory: 16.00 GB Free machine memory: 8.25 GB Max heap memory : 3.56 GB Processors: 8 Configured max memory: 11.20 GB High-IO: true -------------------------------------------------------- (1/4) Node import 2019-12-21 16:30:56.588+0300 Estimated number of nodes: 1.08 M Estimated disk space usage: 2.39 GB Estimated required memory usage: 1.01 GB -------------------------------------------------------- (2/4) Relationship import 2019-12-21 16:31:06.925+0300 Estimated number of relationships: 40.27 M Estimated disk space usage: 1.28 GB Estimated required memory usage: 1.02 GB -------------------------------------------------------- (3/4) Relationship linking 2019-12-21 16:31:19.253+0300 Estimated required memory usage: 1.01 GB -------------------------------------------------------- (4/4) Post processing 2019-12-21 16:31:30.019+0300 Estimated required memory usage: 1020.01 MB -------------------------------------------------------- IMPORT DONE in 36s 435ms. (Total Time) Imported: 1630472 nodes 30524918 relationships 39217359 properties Peak memory usage: 1.05 GB Measure Neo4j Loaded Data Size #Execute This Command sudo du -hc $NEO4J_HOME/data/databases/graph.db/*store* ------------------------------------------------------------------------ Creating Index on User(user_id) Property In this dataset, we have only node type Profile (imported as User ). Therefore, I created index on user_id property of Node Type User. # Execute The Following Command Through cypher-shell CREATE INDEX ON :User(user_id); 0 rows available after 964 ms, consumed after another 0 ms Added 1 indexes # If you want to drop index from :User(user_id) , you can execute following command. DROP INDEX ON :User(user_id) Total Index Size Of The Database: 404Kb. You can measure this size executing the command that we use while measuring graph.db size before. ------------------------------------------------------------------------ Run benchmark ---------------------------------------- # Warm up NEO4J, wait until finished and keep the cypher-shell open(warm up may take a long time) neo4j>call apoc.warmup.run(true, true); After warming up, I test different queriest for 10 times and calculate their averages, output results to ./load_scripts/queryscripts/result-with-index/*resultquery*.txt and ./load_scripts/queryscripts/result-without-index/*resultquery*.txt. --------------------------------------------------------------------------- Results Without Index # Query 1 "Find 10000 users with ids and get their ages" Average execution time: 2.3032222222222223 seconds # Query 2 "Return All Friend Relationship Count In Graph.db" Average execution time: 87.831 seconds # Query 3 "Find Friends Of 1000 users with ids and get friends' user_id" Average execution time: 4.0456999999999996 seconds # Query 4 "Find Friends Of Friends of 1000 users with ids" Average execution time: 45.91822222222222 seconds # Query 5 "Find 10000 users with ids and update their age by 1" Average execution time: 20.3614444444444445 seconds # Query 6 "Find Between Shortest Path Of 1000 Different Users" Average execution time: 14.773444444444445 seconds --------------------------------------------------------------------------- Results With Index # Query 1 "Find 10000 users with ids and get their ages" Average execution time: 1.2232432 seconds # Query 2 "Return All Friend Relationship Count In Graph.db" Average execution time: 56.831 seconds # Query 3 "Find Friends Of 1000 users with ids and get friends' user_id" Average execution time: 3.27726 seconds # Query 4 "Find Friends Of Friends of 1000 users with ids" Average execution time: 32.63756 seconds # Query 5 "Find 10000 users with ids and update their age by 1" Average execution time: 17.86121 seconds # Query 6 "Find Between Shortest Path Of 1000 Different Users" Average execution time: 8.723723232 seconds --------------------------------------------------------------------------- Detailed Benchmark Report can be found in ./benchmark-report.pdf