Real-time visualization of delays in the norwegian public transport

This repository is part of a project in the course Real-Time Big Data Processing at unibz.

Requirements

This project requires

docker and docker-compose
python 3.7
yarn and node 16

Setup

Kafka

All setup instructions assume a *nix system, preferably some Linux variant.

Copy the example environment file. Most likely it does not need to be modified: cp .env.example .env

First, start all all the docker containers for kafka and flink: docker-compose up -d.

Python setup

The producer and processor can run using the same python interpreter and virtual environment.

# Create virtual environment
$ python3.7 -m venv venv
$ source venv/bin/activate
# Install dependencies
$ pip install -r requirements.txt

NOTE: It's important to run the producer before the processor, as it sets up the kafka topic used for producing.

Run the producer:

$ python producer/producer.py

Run the processor (in another shell):

$ pytbon processor/processor.py

For the processor, you also need the jar for the kafka sql connector for flink. The latest version can be found here.

As of writing, version 1.15.0 is used

Web app

Open another shell to run the front-end beside the stream processor

$ cd webapp

To use the google maps API, you need an API from google cloud. Use the .env template and insert the API key: cp .env.example .env.local and fill in the API key in the .env.local file.

First, run the kafka-proxy. This application will create a WebSocket proxy to kafka that we can use to access the kafka topic directly from the browser.

$ node kafka-proxy.js

Then, start the webapp