/marinewatch

find best navigable routes

Primary LanguageScalaMIT LicenseMIT

Marinewatch

See the slides or the report (french) for details.

Overview

General

marinwatch

Lambda Architecture

marinwatch_spark_neo4j

Batch processing

marinwatch_batch_processing

Streaming Processing

marinwatch_streaming_processing

Developer setup

Requirements

  • rbenv
  • docker (+ docker-compose)

Project setup

Go to the project directory

git clone git@github.com:gautierdelorme/marinewatch.git
cd marinewatch

Start docker containers

docker-compose up -d

Install the required version of Ruby

rbenv install
rbenv rehash

Install Bundler

gem install bundler
rbenv rehash

Install required gems for the web api

cd ./code/web_api
bundle install
rbenv rehash

Project Overview

Technologies

  • Docker as containers manager
  • Batch and Streaming processing using Spark and HDFS (written in Scala)
  • Neo4j as graph database
  • Web and Streaming APIs built in Ruby (using Sinatra)
  • CLI tool written in pure bash
  • Git as versioning system

Architecture

  • code/: contains all the marinewatch code
    • mwspark Spark Scala application
      • Batch processing to generate structured data to be imported in Neo4j
      • Streaming processing to update Neo4j data in real time
    • web_api Web API to get shortest path between two geo coordinates (supported formats: html, json and kml)
    • streaming_api Streaming API to push new updates from boats
  • data/: contains all data files
    • input Data files used by Spark jobs
    • output Data files generated by Spark jobs
  • neo4j/:
    • conf: Neo4j config files
    • data: Neo4j databases
    • logs: Logs generated by Neo4j
    • plugins: Plugins used by Neo4j
  • docker-compose.yml: Docker config file
  • marinewatch-cli: CLI tool used to manage the app

How it works

Important:

  • You need to have docker running

Run Batch Spark job with specified accuracy

./marinewatch-cli -b 40

Run Streaming Spark job listening on specified address

./marinewatch-cli -c

Create Neo4j database

./marinewatch-cli -u dbname
# create new database named dbname
# import new data inside, start the database
# create an index on (latitude,longitude)

Start existing Neo4j database

./marinewatch-cli -d dbname

Start web API

./marinewatch-cli -s
# You can see the result from this endpoint for example
# http://localhost:4567/route?from=39.425,6.825&to=6.225,103.050

Batch processing + New database + Web Server in a single command

./marinewatch-cli -s -b 40 -u dbname

Full CLI Documentation

$ ./marinewatch-cli -h
Usage: marinewatch-cli [-h] [-b <int>] [-c <string>] [-u <string>] [-d <string>] [-s] [-t]

  -h  Help. Display this message and quit.
  -b  <int>  Run batch process with specified accuracy.
  -c  <string>  Run streaming process listening on specified address.
  -u  <string>  Create new database with name.
  -d  <string>  Start database with name.
  -s  Start web server.
  -t  Start streaming server.

Improvements to do

  • Improve speed
  • Use datasets with better accuracy (1/40)
  • Add Spark Streaming processing
  • Do not restrict cost to boats density
  • ...

License

This project is licensed under the MIT License. See the LICENSE file for details.