In this demo we take the crime dataset from the City of Chicago, turn it into a streaming data source and process the data in two paths:
- an online path, using a time series database (InfluxDB) and visualize the crime types in Grafana.
- an offline part, using Spark jobs to create an overlay heat map of aggregated crimes on Google Maps.
First you need to clone this repo:
$ git clone https://github.com/mesosphere/time-series-demo.git && cd time-series-demo/
Then you can set up and launch the components:
- Set up Kafka and Spark.
- Set up and launch the Crime Data Producer.
- Set up InfluxDB and Grafana online path, configure it and launch it.
- Set up Kubernetes and the offline reporting Web app and launch it.
We've also recorded a walkthrough of the set up and launch steps as asciicasts:
You can also check out the overview deployment doc for more details.
- Mesosphere DCOS 1.3 {ALL}
- Marathon 0.11.1 {ALL}
- Spark 1.5 {ALL}
- Kubernetes 1.0.6 {OFFLINE}
- InfluxDB 0.9.4 {ONLINE}
- Grafana 2.1.3 {ONLINE}
- heatmap.js 2.0 {OFFLINE}
- AWS S3 and the CLI {OFFLINE}
- Docker Hub
- Offline reporting Web UI {OFFLINE}
- S3 fetcher {OFFLINE}
To do:
- Create video walkthrough (Michael H9)
- Add real timestamps to InfluxDB data (Michael G)