This is the repository for data loading that accompanies the Graphing Relational Databases project with Neo4j (graph database) and MariaDB (relational database). All necessary pieces to replicate the project are linked below.
In this project, we will see how to translate part of all data from a relational format into a graph. Goals of this project include the following:
-
Learn the basics of migrating the model and practical steps for actually implementing it.
-
Understand naming and case conventions when migrating from tables to graphs, as well as some principles for a successful graph model.
-
Get concrete steps for graphing relational databases.
Each database is spun up using a Docker container. The linked custom images use the official image of each database as the base, then add necessary tools and expose ports preferred for each.
Instructions for getting each container up and running are included in the related repository.
The data set is sourced from a Kaggle repository, but it has been cleaned up a lot. Cleaned files are available in this repository under the /data
directory.
To accomplish loading the data, you will need to copy each of the csv files and scripts to the local directory that corresponds to the data directory defined in the Docker run script for each database.
For instance, I’ll need to put the csv files and both scripts for Neo4j in the same directory as specified in this line of the Dockerfile for my Neo4j container.
Next, we need to follow the steps outlined below for each data load.
-
SSH into container
docker exec -it mymaria bash
-
Change to expected directory (/var/lib/mysql)
cd /var/lib/mysql
-
Run
load_data.sh
script to import all data
-
Verify data (replace
<table>
with any table name)
mysql -uroot -pTesting123
MariaDB [(none)]> use coffee_shop; MariaDB [(none)]> show tables; MariaDB [(none)]> SELECT * FROM <table> LIMIT 10;
-
Exit both mysql shell and Docker container shell
exit exit
-
SSH into container
docker exec -it myneo4j bash
-
Change to expected directory
cd import
-
Run
load_data.sh
script to import all data
-
Verify data
cypher-shell -u neo4j -p Testing123
neo4j> MATCH (c:Customer) RETURN c LIMIT 10; neo4j> MATCH (n)-[rel]-(other) RETURN * LIMIT 10;
-
Exit both cypher shell and Docker container shell
:exit exit
NOTE: If this project has been run before on an older version, then when you create the updated container, I recommend cleaning out any data store files in the /data
directory for each database.
I.e. for MariaDB, anything that is not a .csv
, .sql
, or .sh
, including database instance directories like mysql, sys, etc.
For Neo4j, you can remove the transactions, databases, and dbms folders. (All the csv files are safe in the /import
directory)
-
Demo queries and session resources: connect-the-tables