/code-wars-import-data-scripts

Scripts and CSV files for importing Northwind data set to MariaDB, MongoDB, and Neo4j Docker containers

Primary LanguageShell

Data load scripts for Code Wars project

This is the repository for data loading that accompanies the Code Wars project with Spring Data and MariaDB, MongoDB, and Neo4j.

All necessary pieces to replicate the project are linked below.

Project background

In this project, we are using Spring Boot and Spring Data to compare and contrast applications using different types of data stores.

Goals of this project include the following:

  1. Understand differences in data modeling for relational, document, and graph using a consistent data set.

  2. See how different databases can alter (or not alter) application development. Answers the question whether application code needs to change based on the database chosen.

  3. See how Spring Data streamlines application development for the developer and puts the focus on which data store is right for the job, rather than on skills with a particular data store.

Docker containers

Each database is spun up using a Docker container. The linked custom images use the official image of each database as the base, then add necessary tools and expose ports preferred for each.

Instructions for getting each container up and running are included in the related repository.

Data load

That’s what this repository is for! To accomplish this piece, you will need to copy each of the csv files and scripts to the local directory that corresponds to the data directory defined in the Docker run script for each database.

For instance, I’ll need to put the csv files and both scripts for Neo4j in the same directory as specified in this line of the Dockerfile for my Neo4j container.

Next, we need to follow the steps outlined below for each data load.

MariaDB

  1. SSH into container

docker exec -it mymaria bash

  1. Change to expected directory (/var/lib/mysql)

cd /var/lib/mysql

  1. Run insert_northwind.sh script to import all data

  1. Verify data (replace <table> with any table name)

mysql -uroot -pTesting123

MariaDB [(none)]> use northwind MariaDB [(none)]> show tables; MariaDB [(none)]> SELECT * FROM <table> LIMIT 10;

  1. Exit both mysql shell and Docker container shell

exit exit

MongoDB

  1. SSH into container

docker exec -it mymongo bash

  1. Change to expected directory

cd data/db

  1. Run insert_northwind.sh script to import all data

  1. Verify data (replace <table> with any table name)

mongo -u mongoadmin -p Testing123

use northwind show collections db.customer.find({customerId:"VINET"}).pretty() db.employee.find({employeeId:9}).pretty() db.order.find({orderId:10614}).pretty() db.product.find({productId:58}).pretty()

  1. Exit both mongo shell and Docker container shell

exit exit

Note: If you review the .log files for the updates to orders, you may notice a syntax error for a string literal. This is because, on the variables that are strings, I use the opening double-quote character, then have a single quote to close the command to expand the variable properly. This means there is are quotations that have not been closed (though are later closed after the single quote following the variable). However, the script works properly and all the data is correct.

Neo4j

  1. SSH into container

docker exec -it myneo4j bash

  1. Change to expected directory

cd import

  1. Run insert_northwind.sh script to import all data

  1. Verify data

cypher-shell -u neo4j -p Testing123

neo4j> MATCH (c:Customer) RETURN c LIMIT 10; neo4j> MATCH (n)-[rel]-(other) RETURN * LIMIT 10;

  1. Exit both mongo shell and Docker container shell

:exit exit

Issues

NOTE: If this project has been run before on an older version, then when you create the updated container, I recommend cleaning out any data store files in the /data directory for each database.

I.e. for MariaDB, anything that is not a .csv, .sql, or .sh, including database instance directories like mysql, sys, etc.

For Mongo, same as above, which means remove the .wt, as well as the diagnostic.data and journal folders.

For Neo4j, you can remove the transactions, databases, and dbms folders. (All the csv files are safe in the /import directory)

Presentation

PDF version of accompanying presentation is published to SpeakerDeck.