/neo4j-csv-firehose

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

neo4j-csv-firehose

Travis CI Status

neo4j-csv-firehose enables Neo4j’s LOAD CSV Cypher command to load other from other datasources as well. It provides on-the-fly conversion of the other datasource to csv - and can therefore act as input for LOAD CSV.

Installation

This project uses [gradle](http://www.gradle.org) as build system. Since the gradle wrapper in included, there is no need to download anything beforehand.

There are three distinct ways to install neo4j-csv-firehose:

  • directly inside Neo4j server as unmanaged extension

  • as external server with undertow

  • as URLStreamHandler hooking directly into to JVM

Installation within a Neo4j server instance (unmanaged extension)

To build the project just type:

./gradlew jar

Copy (or symlink) the resulting file ./build/libs/neo4j-csv-firehose-0.1-SNAPSHOT.jar to Neo4j’s plugins folder.

Copy any JDBC driver jar files for your relational databases into the plugins folder as well. E.g. for mysql use http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.36/mysql-connector-java-5.1.36.jar.

Change configuration in $NEO4J/conf/neo4j-server.properties:

org.neo4j.server.thirdparty_jaxrs_classes=org.neo4j.extension.firehose=/csv

If your Neo4j server uses authentication (which it does by default), amend to neo4j.properties:

extension.csv.http.username=neo4j
extension.csv.http.password=<mypassword>

After a restart using $NEO4J_HOME/bin/neo4j restart firehose is active.

Installation as a separate server

Under the hoods a undertow server with jaxrs support is start. Before running the server be sure to add your jdbc drivers to the runtime dependencies of build.gradle. To start up the server:

./gradlew run

This starts up the server on localhost:8080.

Installation as URLStreamHandler

Type

./gradlew jar

to build and copy (or symlink) the resulting file ./build/libs/neo4j-csv-firehose-0.1-SNAPSHOT.jar to Neo4j’s plugins folder.

Copy any JDBC driver jar files for your relational databases into the plugins folder as well. E.g. for mysql use http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.36/mysql-connector-java-5.1.36.jar.

In neo4j-wrapper.conf configure URL handlers:

wrapper.java.additional=-Djava.protocol.handler.pkgs=org.neo4j.extension.firehose

and restart Neo4j.

Usage

A new REST endpoint for rendering JDBC datasource as csv is added:

http://localhost:7474/csv/jdbc?parameter1=value1&parameter2=value2....

For separate server deployment the URL is (hint csv is omitted):

http://localhost:8080/jdbc?parameter1=value1&parameter2=value2....

For URLStreamHandler style deployment the URLs look like

jdbc:mysql://localhost/mydb?parameter=value&...

The following query parameters are available - remember to use proper url encoding for their values:

parameter name description mandatory

url

the jdbc connection string, syntax depends on your jdbc driver, e.g. jdbc:mysql://localhost/mydb

yes, not used for URLStreamHandler variant

user

database login for jdbc connection

no

password

database password for jdbc connection

no

table

name of the database table to convert to csv. All columns and rows are used

either table or sql

sql

a sql statement to be rendered to csv

either table or sql

This endpoint can then be used to run LOAD CSV in Cypher. We assume having a mysql database on localhost, dbname is mydb, credentials are mydb:123. We want to import from table person which has 2 columns: lastName and firstName. Using the following Cypher statement hits directly the mysql database and creates a node for each person. Here’s the version for running via unmanaged extensions:

neo4j-sh (?)$ load csv with headers from "http://localhost:7474/csv/jdbc?url=jdbc%3Amysql%3A%2F%2Flocalhost%2Fmydb&table=person&user=mydb&password=123" as line create (:Person {firstname: line.firstName, lastname: line.lastName});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2142
Properties set: 4284
Labels added: 2142
1422 ms

When running as separate server use:

neo4j-sh (?)$ load csv with headers from "http://localhost:8080/jdbc?url=jdbc%3Amysql%3A%2F%2Flocalhost%2Fmydb&table=person&user=mydb&password=123" as line create (:Person {firstname: line.firstName, lastname: line.lastName});

Running as URLStreamHandler:

neo4j-sh (?)$ load csv with headers from "jdbc:mysql://localhost/mydb?table=person&user=mydb&password=123" as line create (:Person {firstname: line.firstName, lastname: line.lastName});

MetaData Endpoint

There is an endpoint which returns the metadata from the database schema:

curl -i 'http://localhost:8080/jdbc/meta?url=jdbc:mysql://localhost/northwind?user=root&database=northwind'

It uses SchemaCrawler to access the information and returns it as JSON in this format:

{
  "nodes": [
    {
      "filename": "Categories",
      "labels": [
        "Categories"
      ],
      "properties": [
        {
          "headerKey": "CategoryID",
          "dataType": "string",
          "neoKey": "CategoryID",
          "primaryKey": true,
          "index": true
        },
....
        {
          "headerKey": "Picture",
          "dataType": "string",
          "neoKey": "Picture",
          "primaryKey": false,
          "index": false
        }
      ]
    },
....
  ],
  "relationships": [
    {
      "filename": "CustomerCustomerDemo",
      "name": "CUSTOMER_TYPE_ID",
      "from": {
        "fileKey": "CustomerTypeID",
        "neoKey": "CustomerTypeID",
        "filename": "CustomerCustomerDemo",
        "label": "CustomerCustomerDemo"
      },
      "to": {
        "fileKey": "CustomerTypeID",
        "neoKey": "CustomerTypeID",
        "filename": "CustomerDemographics",
        "label": "CustomerDemographics"
      }
    },
....
  ]
}