/neo4j-gtfs

A showcase of using neo4j for GTFS data

Primary LanguageJava

General Transit Feed Specification (GTFS) Neo4j Importer

1. Project Overview

This project is a showcase of spring-data, neo4j, rest and hateoas.

The Spring Boot based application provides the following features:

  • Performs an unattended download of General Transit Feed Specification (GTFS) data from New Jersey Transit.

  • Loads the downloaded GTFS files into Neo4j.

  • Provides web services, such as a trip planner API, to allow you to interact with the data.

You need an account as an NJ transit developer. You can sign up here.

This project is largely derived off the work that was done by Rick Van Bruggen (github account rvanbruggen)

Querying the data is based off his blog entry "Querying GTFS data - using Neo4j 2.3 - part 2/2"

2. Getting neo4j-gtfs Up and Running

2.1. What you’ll need

  • About 15 minutes

  • You need access to GTFS files. You have two approaches to load this data:

    • Create a developer account at NJ Transit, and have neo4j-gtfs download and load the file for you.

    • Download a GTFS feed file from any transportation provider, and have neo4j-gtfs load it for you.

  • JDK 1.8 or later

  • Gradle 2.3+ or Maven 3.0+

  • You can also import the code straight into your IDE:

2.2. Standing up a Neo4j server

Before you can build a this application, you need to set up a Neo4j server.

Neo4j has an open source server you can install for free:

On a Mac, just type:

$ brew install neo4j

Also on a mac, make sure the Neo4j imports folder has proper write permissions. This should take care of it:

chmod -R g+w /usr/local/Cellar/neo4j/3.1.4/libexec/import

Once you installed, launch it with it’s default settings:

$ neo4j start

You should see a message like this:

Starting Neo4j.
Started neo4j (pid 96416). By default, it is available at http://localhost:7474/
There may be a short delay until the server is ready.
See /usr/local/Cellar/neo4j/3.0.6/libexec/logs/neo4j.log for current status.

By default, Neo4j has a username/password of neo4j/neo4j. However, it requires that the new account password be changed. To do so, execute the following command:

$ curl -v -u neo4j:neo4j -X POST localhost:7474/user/neo4j/password -H "Content-type:application/json" -d "{\"password\":\"secret\"}"

This changes the password from neo4j to secret (something to NOT DO in production!) With that completed, you should be ready to run this guide.

2.3. Permissions to access Neo4j and the NJ Transit Website

Neo4j Community Edition requires credentials to access it. This can be configured with a couple of properties.

spring.data.neo4j.username=neo4j
spring.data.neo4j.password=secret

This includes the default username neo4j and the newly set password secret we picked earlier.

Similarly, your login to download the GTFS files from the New Jersey Transit website are stored in this file.

njgtfs.login=foo
njgtfs.password=bar
Warning
Do NOT store real credentials in your source repository. Instead, configure them in your runtime using Spring Boot’s property overrides.

2.4. Build an executable JAR

You can run the application from the command line with Gradle or Maven. Or you can build a single executable JAR file that contains all the necessary dependencies, classes, and resources, and run that. This makes it easy to ship, version, and deploy the service as an application throughout the development lifecycle, across different environments, and so forth.

If you are using Gradle, you can run the application using ./gradlew bootRun. Or you can build the JAR file using ./gradlew build. Then you can run the JAR file:

java -jar build/libs/neo4j-gtfs-0.1.0.jar

If you are using Maven, you can run the application using ./mvnw spring-boot:run. Or you can build the JAR file with ./mvnw clean package. Then you can run the JAR file:

java -jar target/neo4j-gtfs-0.1.0.jar
Note
The procedure above will create a runnable JAR. You can also opt to build a classic WAR file instead.

The server comes up on http://localhost:8080 by default.

With this in place, let’s load up the data and interact with it.

3. Using neo4j-gtfs

3.1. Loading the data

Two endpoints are provided:

  • Dowload and import the data fully automated from the NJ Transit developer website:
    http://localhost:8080/customrest/LoadData

  • The NJ GTFS download is clickwrapped, so getting things to work automated has been very temperamental.
    If the /customrest/LoadData endpoint does not succeed, you can import a pre-downloaded zip file by:

3.2. Interact with the data via Spring-Data-Rest

By default all the endpoints exposed via spring-data-rest are left in place. You can traverse through those by accessing the root of the app server.:

To understand how this works, read the page Understanding HATEOS prepared by the Spring community.

3.3. Interact with the data via custom web services

The application also exposes web services hosting custom cypher queries for trip planning. Currently only one such endpoint exists, and it is purpose built to provide trip options from one station to another given departure and arrival time criteria:

curltests/planTrip.sh

#!/usr/bin/env bash
curl -H "Content-Type: application/json" -X POST --data @TripPlanNoTransfer.json http://localhost:8080/customrest/planTripNoTransfer | python -m json.tool

curltests/TripPlan.json

{
  "serviceId":"4",
  "origStation":"WOOD-RIDGE",
  "origArrivalTimeLow" :"07:00:00",
  "origArrivalTimeHigh" :"08:10:00",
  "destStation" :"HOBOKEN",
  "destArrivalTimeLow":"06:30:00",
  "destArrivalTimeHigh":"10:00:00"
}

Response - 2 possible trips - One leaving at 7:43 and the second leaving at 7:27:

[
    [
        {
            "arrivalTime": "07:43:00",
            "departureTime": "07:43:00",
            "stopName": "WOOD-RIDGE",
            "stopSequence": 15,
            "tripId": "2815"
        },
        {
            "arrivalTime": "07:54:00",
            "departureTime": "07:54:00",
            "stopName": "FRANK R LAUTENBERG SECAUCUS LOWER LEVEL",
            "stopSequence": 16,
            "tripId": "2815"
        },
        {
            "arrivalTime": "08:05:00",
            "departureTime": "08:05:00",
            "stopName": "HOBOKEN",
            "stopSequence": 17,
            "tripId": "2815"
        }
    ],
    [
        {
            "arrivalTime": "07:27:00",
            "departureTime": "07:27:00",
            "stopName": "WOOD-RIDGE",
            "stopSequence": 16,
            "tripId": "2821"
        },
        {
            "arrivalTime": "07:38:00",
            "departureTime": "07:38:00",
            "stopName": "FRANK R LAUTENBERG SECAUCUS LOWER LEVEL",
            "stopSequence": 17,
            "tripId": "2821"
        },
        {
            "arrivalTime": "07:49:00",
            "departureTime": "07:49:00",
            "stopName": "HOBOKEN",
            "stopSequence": 18,
            "tripId": "2821"
        }
    ]
]

curltests/planTripOneStop.sh

#!/usr/bin/env bash
curl -H "Content-Type: application/json" -X POST --data @TripPlanOneTransfer.json http://localhost:8080/customrest/planTripOneTransfer | python -m json.tool

curltests/TripPlanOneStop.json

{
  "serviceId":"4",
  "origStation":"WOOD-RIDGE",
  "origArrivalTimeLow" :"06:30:00",
  "origArrivalTimeHigh" :"07:10:00",
  "destStation" :"RUTHERFORD",
  "destArrivalTimeLow":"06:30:00",
  "destArrivalTimeHigh":"10:00:00"
}

Response 1 trip with one tramsfer leaves origin at 6:46 arrives at midpoint at 6:56, the transfer train leaves at 8:09 and arrives the destination at 8:09

[
    [
        {
            "arrivalTime": "06:46:00",
            "departureTime": "06:46:00",
            "stopName": "WOOD-RIDGE",
            "stopSequence": 16,
            "tripId": "2820"
        },
        {
            "arrivalTime": "06:56:00",
            "departureTime": "06:56:00",
            "stopName": "FRANK R LAUTENBERG SECAUCUS LOWER LEVEL",
            "stopSequence": 17,
            "tripId": "2820"
        }
    ],
    [
        {
            "arrivalTime": "08:09:00",
            "departureTime": "08:09:00",
            "stopName": "FRANK R LAUTENBERG SECAUCUS LOWER LEVEL",
            "stopSequence": 2,
            "tripId": "1249"
        },
        {
            "arrivalTime": "08:17:00",
            "departureTime": "08:17:00",
            "stopName": "RUTHERFORD",
            "stopSequence": 3,
            "tripId": "1249"
        }
    ]
]

3.4. Interact with the data using Cypher

Open your browser to Neo4j’s own Cypher query tool by opening your browser to http://localhost:7474/ and start writing cypher queries like the ones below

3.4.1. Find a DIRECT route with range conditions

MATCH
  (orig:Stop {name: "WESTWOOD"})--(orig_st:Stoptime)-[r1:PART_OF_TRIP]->(trp:Trip)
WHERE
  orig_st.departure_time > "06:30:00"
  AND orig_st.departure_time < "07:10:00"
  AND trp.service_id="4"
WITH
  orig, orig_st
MATCH
    (dest:Stop {name:"HOBOKEN"})--(dest_st:Stoptime)-[r2:PART_OF_TRIP]->(trp2:Trip)
WHERE
    dest_st.arrival_time < "08:00:00"
    AND dest_st.arrival_time > "07:00:00"
    AND dest_st.arrival_time > orig_st.departure_time
    AND trp2.service_id="4"
WITH
    dest,dest_st,orig, orig_st
MATCH
    p = allshortestpaths((orig_st)-[*]->(dest_st))
WITH
    nodes(p) as n
UNWIND
    n as nodes
//MATCH
//  p=((nodes)-[loc:LOCATED_AT]->(stp:Stop))
OPTIONAL MATCH
    p=(nodes)-[r:PRECEDES|LOCATED_AT]->(next)
RETURN
    p, COALESCE(nodes.stop_sequence, next.stop_sequence AS stopSequence
ORDER BY stopSequence;
route_and_stops

3.4.2. Plan an indirect route

//find the route and the stops for the indirect route
MATCH
    (t:Stop),(a:Stop)
WHERE
    t.name = "WESTWOOD"
AND
    a.name="HOBOKEN"
WITH
    t,a
MATCH
    p = allshortestpaths((t)-[*]-(a))
WHERE
    NONE (x in relationships(p) where type(x)="OPERATES")
RETURN
    p
LIMIT
    10;
route_and_stops

3.4.3. Plan a specific indirect route with all stop and trip information.

MATCH
    p3=(orig:Stop {name:"WESTWOOD"})<-[:LOCATED_AT]-(st_orig:Stoptime)-[r1:PART_OF_TRIP]->(trp1:Trip),
    p4=(dest:Stop {name:"RUTHERFORD"})<-[:LOCATED_AT]-(st_dest:Stoptime)-[r2:PART_OF_TRIP]->(trp2:Trip),
    p1=(st_orig)-[im1:PRECEDES*]->(st_midway_arr:Stoptime),
    p5=(st_midway_arr)-[:LOCATED_AT]->(midway:Stop)<-[:LOCATED_AT]-(st_midway_dep:Stoptime),
    p2=(st_midway_dep)-[im2:PRECEDES*]->(st_dest)
WHERE
  st_orig.departure_time > "07:30:00"
  AND st_orig.departure_time < "09:00:00"
  AND st_dest.arrival_time < "10:30:00"
  AND st_dest.arrival_time > "09:00:00"
  AND st_midway_arr.arrival_time > st_orig.departure_time
  AND st_midway_dep.departure_time > st_midway_arr.arrival_time
  AND st_dest.arrival_time > st_midway_dep.departure_time
  AND trp1.service_id = "4"
  AND trp2.service_id = "4"
WITH
  st_orig, st_dest, nodes(p1) + nodes(p2) AS allStops1
ORDER BY
    (st_dest.arrival_time_int-st_orig.departure_time_int) ASC
SKIP 0 LIMIT 1
UNWIND
  allStops1 AS stoptime
MATCH
  p6=(loc:Stop)<-[r:LOCATED_AT]-(stoptime)-[r2:PART_OF_TRIP]->(trp5:Trip),
  (stoptime)-[im1:PRECEDES*]->(stoptime2)
RETURN
  p6, im1
ORDER BY stoptime.departure_time_int ASC
;
route_and_stops

3.5. Create your own services

Add new queries to the repository com.popameeting.gtfs.neo4j.repository and interact with them via Spring-Data-Rest’s provided web services - if you need the data presented differently see the projections in com.popameeting.gtfs.neo4j.entity.projection and how they are being used in the URL above.