An uber data pipeline sample app. Play Framework, Akka Streams, Kafka, Flink, Spark Streaming, and Cassandra.
Start Kafka:
./sbt kafkaServer/run
Web App:
- Obtain an API key from mapbox.com
- Start the Play web app:
MAPBOX_ACCESS_TOKEN=YOUR-MAPBOX-API-KEY ./sbt webapp/run
Try it out:
- Open the driver UI: http://localhost:9000/driver
- Open the rider UI: http://localhost:9000/rider
- In the Rider UI, click on the map to position the rider
- In the Driver UI, click on the rider to initiate a pickup
Start Flink:
./sbt flinkClient/run
- Initiate a few pickups and see the average pickup wait time change (in the stdout console for the Flink process)
Start Cassandra:
./sbt cassandraServer/run
Start the Spark Streaming process:
./sbt kafkaToCassandra/run
- Watch all of the ride data be micro-batched from Kafka to Cassandra
Setup PredictionIO Pipeline:
-
Setup PIO
-
Set the PIO Access Key:
export PIO_ACCESS_KEY=<YOUR PIO ACCESS KEY>
-
Start the PIO Pipeline:
./sbt pioClient/run
Copy demo data into Kafka or PIO:
For fake data, run:
./sbt "demoData/run <kafka|pio> fake <number of records> <number of months> <number of clusters>"
For New York data, run:
./sbt "demoData/run <kafka|pio> ny <number of months> <sample rate>"
Start the Demand Dashboard
PREDICTIONIO_URL=http://asdf.com MAPBOX_ACCESS_TOKEN=YOUR_MAPBOX_TOKEN ./sbt demandDashboard/run