This hands-on tutorial guides you through the exciting world of graph data analytics using the powerful combination of NetworkX (for analysis) and Neo4j (as a graph database). You'll learn how to connect to Neo4j, import and model graph data, analyze graph structures, perform essential calculations, and visualize your findings using the popular Game of Thrones dataset as a case study.
You'll need the following:
- Python 3.6 or higher: The backbone of our analysis. Download it from python.org.
- Neo4j Desktop 1.5.9: Our graph database environment. Download it from neo4j.com.
- Database Version: Neo4j 5.12.0
- Plugins: APOC 5.12.0 and Graph Data Science Library 2.6.8. These should be installed in your Neo4j instance. (Instructions for plugin installation will be provided later in the tutorial.)
-
Create a Virtual Environment (Highly Recommended):
- Open your terminal and navigate to the directory where you've cloned this project.
- Create a virtual environment:
python3 -m venv myenv
- Activate the environment:
source myenv/bin/activate # Linux/macOS myenv\Scripts\activate # Windows
-
Install Dependencies:
- With your virtual environment activated, install the required libraries:
This will install NetworkX, the Neo4j Python driver, Jupyter Notebooks, and any visualization libraries you've specified.
pip install -r requirements.txt
- With your virtual environment activated, install the required libraries:
-
Launch Neo4j Desktop:
- Double-click the Neo4j Desktop icon to open the application.
-
Create a New Project:
- In Neo4j Desktop, click on "New Project."
- Name your project "Data Science Summer School 2024" and click "Create."
-
Create a New DBMS (Database Management System):
- In your new project, click on "New DBMS."
- Name it "Graph Data Analytics" and select the Neo4j version 5.12.0.
- Click "Create" to set up the DBMS.
-
Create a Game of Thrones Database:
- Under the "Graph Data Analytics" DBMS, click on "New Database."
- Name it "GoT" and click "Create."
- Once created, click "Start" to activate the database.
- In these tutorials we will use the (default) username "neo4j" and the password "your_password".
-
Open Neo4j Browser:
- Click on the "Open" button next to your new "GoT" database. This will launch the Neo4j Browser, where you'll run Cypher queries to import and explore your graph data.
-
Import the Game of Thrones Dataset:
- Copy and paste the following Cypher commands into the Neo4j Browser one by one:
CREATE CONSTRAINT FOR (c:Character) REQUIRE c.name IS UNIQUE; // Ensure unique character names LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data/asoiaf-book1-edges.csv' AS row MERGE (src:Character {name: row.Source}) MERGE (tgt:Character {name: row.Target}) MERGE (src)-[r:INTERACTS1]->(tgt) ON CREATE SET r.weight = toInteger(row.weight), r.book=1; LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data/asoiaf-book2-edges.csv' AS row MERGE (src:Character {name: row.Source}) MERGE (tgt:Character {name: row.Target}) MERGE (src)-[r:INTERACTS2]->(tgt) ON CREATE SET r.weight = toInteger(row.weight), r.book=2; LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data/asoiaf-book3-edges.csv' AS row MERGE (src:Character {name: row.Source}) MERGE (tgt:Character {name: row.Target}) MERGE (src)-[r:INTERACTS3]->(tgt) ON CREATE SET r.weight = toInteger(row.weight), r.book=3; LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/neo4j-examples/graphgists/master/browser-guides/data/asoiaf-book45-edges.csv' AS row MERGE (src:Character {name: row.Source}) MERGE (tgt:Character {name: row.Target}) MERGE (src)-[r:INTERACTS45]->(tgt) ON CREATE SET r.weight = toInteger(row.weight), r.book=45;
-
Verify Data Import:
MATCH (n) RETURN COUNT(n) AS nodeCount; //796
All tasks, examples, and the corresponding documentation are provided in the folders task_0 - task_4.