bootiful-music

The projects and articles in this repository present a "journey" from a relational database to a database with relations. We first have a look at a simple but powerful scheme that represents musical listening habits. This schema lives in a PostgreSQL and its four entities gives us enormous analytical possibilities through applying modern SQLs constructs. We will then see how to create a Graph inside Neo4j from the same data and explore which possibilities we gain by using real relationships.

We will use Spring Data Neo4j from within Spring Boot to interact with the Graph.

See this series of blog posts: From relational databases to databases with relations.

1. Talks

1.1. Devoxx Ukraine 2018

Directlink: Going from relational databases to databases with relations with Neo4j and Spring Data

2. The business domain

We’re dealing with the tracking of musical habits, much like LastFM. The author has been running a service that does exactly that on a relational database for some time. The schema is as follows:

@startuml
' see https://gist.github.com/QuantumGhost/0955a45383a0b6c0bc24f9654b3cb561

!define Table(name,desc) class name as "desc" << (T,#FFAAAA) >>
!define primary_key(x) <b>x</b>
!define unique(x) <color:green>x</color>
!define not_null(x) <u>x</u>

hide methods
hide stereotypes

Table(artists, "artists\n(Artists that have been played)") {
primary_key(id) INTEGER
not_null(unique(name)) VARCHAR[255]
}

Table(genres, "genres\n(Genres that have been played)") {
primary_key(id) INTEGER
not_null(unique(name)) VARCHAR[255]
}

Table(tracks, "tracks\n(Partially normalized track data)") {
primary_key(id) INTEGER
not_null(artist_id) INTEGER
not_null(track_id) INTEGER
name VARCHAR[4000]
}

Table(plays, "plays\n(The actualy play data)") {
primary_key(id) INTEGER
not_null(track_id) INTEGER
played_on DATETIME
}

' relationships
' one-to-one relationship
artists "1"-->"*" tracks : "A track has been released by an artist"
' one to may relationship
genres "1"-->"*" tracks : "A track has a genre"
' many to many relationship
' Add mark if you like
tracks "1" --> "*" plays : "A tracks has been played some times"
@enduml

The graph looks a bit different:

TODO

2.1. Questions to be answered:

2.1.1. Statistics

The charts of this month?
Which genre has been played the most?
Which artists has the highest cumulative play count over time?

2.1.2. Knowledge

Recommend tracks and albums that fits my current top ten?
Recommend similiar artists i could like?
What have the tracks in common I like the most?
Are my favorite artists related and did they feature each other?

3. Topics addressed

3.1. Neo4j

From our getting started guide:

Neo4j is an open-source NoSQL native graph database which provides an ACID-compliant transactional backend for your applications. With development starting in 2003, it has been publicly available since 2007.

— Neo4j getting started guide

While a relational database management system (RDBMs) stores relations between tuples (hence the tables itself in a RDBMs are called relations), a graph database is a database designed to treat the relationships between data as a first-class citizen in the data model.

3.2. Spring Boot

The Spring Framework has been been around since 2002 and is one of the oldest Enterprise Java Frameworks actively used, maintained and developed. Spring Boot is ten years younger. It dates back to a ticket "Improve support for containerless application" which already describes important goals for the first release in April 2014.

Fast start for all kind of development wit hSpring
No generation of code or configuration
Easily configurable from the outside
Consistent component model
Lots of non-functional feature out of the box, like metrics, health checks and so on
Improve the developer experience with other Spring related projects

The deployment scenario most often used with Spring Boot applications are self-contained Jars, including batteries. That is: Having all needed libraries with them, including a servlet container or similar when the application is a web-application of some kind.

3.3. Spring Data

Let’s quote the Spring Data site:

Spring Data’s mission is to provide a familiar and consistent, Spring-based programming model for data access while still retaining the special traits of the underlying data store.

It makes it easy to use data access technologies, relational and non-relational databases, map-reduce frameworks, and cloud-based data services.

— What is Spring Data

Spring Data itself is an umbrella project with support for several, quite different datastores, reaching from classic RDBMs-systems over document- and key-value-stores to cloud based services. Non-relational datastores certainly include Neo4j.

At the core of Spring Data lives the repository. There are several sources for the repository pattern. One is from Martin Fowlers Patterns of Enterprise Application Architecture.

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

— Edward Hieatt and Rob Mee
A repository

Further down the road we’ll see why the distinction between domain and data mapping layer is important: When discussing the relationship between Spring Data Neo4j (SDN) and Object Graph Mapping (OGM).

You’ll find the repository pattern also prominent in Domain Driven Design (DDD). That reference is very nicely explained by our own Mark Needham here DDD: Repository pattern.

Regardless whether you’re using a relational database or a graph database, you can access your aggregate roots in a consistent way. However, you still have to think yourself how build and create those aggregate roots.

Spring Data repositories and the entities defined therein also support events, auditing and more. Some people fancy the dynamic query derivation from repository method names a lot.

For nearly every store, Spring Data also provides more low level access patterns, often in the form of a XXXTemplate or XXXOperations. We will also dive into that.

Spring Data relies on Springs Dependency Injection mechanism and brings in some dependencies. It can be used without Spring Boot, but Spring Boot does a lot of useful autoconfiguration.

3.4. Neo4j OGM

Neo4j OGM stands for Object Graph Mapping and is used to mapped nodes, their properties and relationships return from a graph to Java objects. While it is much easier to map Nodes and their relationships from a graph database to a network of Java objects than mapping rows returned from a relational database to objects (See Object-relational impedance mismatch), there are still edge cases:

Neo4j can be used without a scheme. How to map basically arbitrary nodes to Objects?
Cypher and Neo4j provide great means to do all kinds of projections. How to map does?
And most important: How to deal with possible endless paths between nodes?

We’ll address all of those points.

4. Building blocks

4.1. Modules

statsdb: Plain java module that contains a Java DSL generated by jOOQ for the relational schema described here.
etl: Some stored procedures for Neo4j that implement an "extract, transform and load" mechanism, connecting PostgreSQL and Neo4j
charts: A revised version of bootiful-databases. For your reference, an english and a german talk on that.
knowledge: Finally, the Spring Boot and SDN based project that uses Neo4j to explore the relationship between artists, their tracks and albums.

4.2. Software needed

Java 11+
Maven is bundled with our repositories
Docker (Community edition) or a version that is bundled with your OS.
Java-IDE of your choice

4.3. Running the databases

To run the modules of this project, you have to have PostgreSQL database with some defined schemas up and running. There’s a Docker module and a Docker Compose file to help you with that.

Note	As this example uses Neo4j Enterprise edition, you have to accept the license by putting in a file named `neo4j-enterprise.env` into the `docker` directory containing the line `NEO4J_ACCEPT_LICENSE_AGREEMENT=yes`!

Please run (cd docker && docker-compose -f docker-compose.yml -f docker-compose.default.yml up) from the root of this project. To stop the processes, use (cd docker && docker-compose stop). This brings up both a PostgreSQL instance as well as a Neo4j instance with APOC already installed. The modules itself can be build without running databases. statsdb uses docker-maven-plugin to bring up PostgreSQL for generating jOOOQ-Classes, etl uses Testcontainers to do the same for PostgresSQL during integration tests. In addition it uses Neo4j test harness for an in-memory Neo4j instance.

5. Further reading

6. About the author

Michael is a recognized Java Champions with more then 10 years experience with the Spring Framework. He has been involed with Spring Boot right from the start. Michael works at Neo4j in the Spring Data Neo4j and OGM. Michael did all kinds of stuff with "crazy" SQL at his time before Neo4j. That involved time series management for power usage in the deregulated German energy market as well as fascinating analysis of spatial data, especially related to utility network plans, on the physical level as well as the logical level. The later would have been a perfect use-case for a Graph database like Neo4j: Which electric circuits travel along which power rods? Where do they intersect? Are there single point of failures?

Those engagement are among the background for the first German book on Spring Boot and many SQL related talks (english version and german version, both with video.

The world of Graphs ("Graphs are everywhere") is quite obvious in the real world. In code and in a database, new to the author. Therefor this repository and articles may be able to address several things for different people:

Getting an idea how to work with data stored in Neo4j
What modern enterprise development with Spring Boot can look like
Where Spring Data Neo4j can help you and where you might want to avoid it

michael-simons/bootiful-music