/flights-monitoring

Real-time flights monitoring using Lambda Architecture (API + KAFKA + SPARK + MONGO + GRAFANA)

Primary LanguagePython

Flights Monitoring | Lambda Architecture Implementation (API + KAFKA + SPARK + MONGO + GRAFANA)

Repo Size Maintenance Maintenance

Index

  • Introduction
  • Quickstart
    • Start the containers
    • Check the status of MongoDB Sink Kafka Connector
    • Start Grafana-Mongo Proxy
    • Start data ingestion
    • Start Spark ReadStream Session
    • Access dashboard in Grafana
    • Stop the streaming
    • Shutdown the containers
  • Monitoring
    • Kafka Cluster
    • Kafka-Mongo Connector
    • Spark Jobs
    • MongoDB
  • Components
  • Troubleshooting

Introduction

The following project has been developed with the goal of implementing current leading technologies in both batch and real-time data management and analysis.
The data source is FlightRadar24, a website that offers real-time flight tracking services. Through an API it is possible to obtain information, in real time, regarding flights around the globe. The developed dashboard allows the user to obtain general information about past days flights, in addition to real-time analytics regarding live air traffic in each zone of interest.
The real-time characteristic of the chosen data source motivates the need to set up an architecture that can cope with the large number of data arriving in a short period of time. The architecture developed make possible the ability to store data and at the same time analyze it in real time. The pipeline consists of several stages, which use current technologies for Real-Time Data Processing.
All of this was developed in a Docker environment to enable future testing and a total replication of the project.

pipeline

Quickstart

Note: Due to file size limits imposed by Github, collections regarding flights and real-time data has been uploaded empty.

Warning: It's recommended having at least 5GB Docker memory allocated. If not, change it in the settings.

Prepare yourself three active terminals placed in the project folder.
In the first one:

  1. Start the containers
> docker compose up -d

Wait about 30 seconds and run the following command:

  1. Check status of MongoDB Sink Kafka Connector
> curl localhost:8083/connectors/mongodb-connector/status | jq
OUTPUT
{
  "name": "mongodb-connector",
  "connector": {
    "state": "RUNNING",
    "worker_id": "connect:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "RUNNING",
      "worker_id": "connect:8083"
    },
    {
      "id": 1,
      "state": "RUNNING",
      "worker_id": "connect:8083"
    }
  ],
  "type": "sink"
}

Make sure you have installed jq, otherwise remove it from the command or install it.
If everything is running go ahead with the steps, otherwise wait about 10 more seconds and retry. If the error persists, see Troubleshooting section.

  1. Start Grafana-Mongo Proxy
> docker exec -d grafana npm run server
  1. Start data ingestion

In the second terminal:

> docker exec -it jupyter python producer.py
  1. Start Spark ReadStream Session

In the third terminal:

> docker exec -it jupyter spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,org.mongodb.spark:mongo-spark-connector:10.0.5 rtprocessing.py
  1. Access the dashboard in Grafana at localhost:3000 | admin@password

  2. Stop the streaming

Whenever you want to stop data generation and processing, simply hit ctrl+c in each active terminal.

  1. Shutdown the containers

To end the session type in the first terminal:

> docker-compose down

dashboard

Monitoring

  1. KAFKA CLUSTER

We need first to execute some commands to enable JMX Polling.

> docker exec -it zookeeper ./bin/zkCli.sh
$ create /kafka-manager/mutex ""
$ create /kafka-manager/mutex/locks ""
$ create /kafka-manager/mutex/leases ""
$ quit

After that we can access the Kafka Manager UI at localhost:9000. Click on Cluster and Add Cluster. Fill up the following fields:

  • Cluster Name = <insert a name>
  • Cluster Zookeeper Hosts = zookeeper:2181

Tick the following:

  • Enable JMX Polling
  • Poll consumer information

Leave the remaining fields by default.
Hit Save.


  1. KAFKA-MONGO CONNECTOR

To check the status of the connector and the status of related tasks, execute the following command.

> curl localhost:8083/connectors/mongodb-connector/status | jq

If everything is working properly you should see a bunch of Running.
Make sure you have installed jq, otherwise remove it from the command or install it.


  1. SPARK JOBS

To view the activity of spark jobs, access at localhost:4040/jobs/.


  1. MONGODB

You can interact with the db and related collections in Mongo through two options:

1. MongoDB Compass UI
2. > docker exec -it mongodb mongo

Components

View data with Compass

1

Jupyter UI

2

Monitoring Spark Jobs

3

Live Data Visualization in Grafana

4

API

The project is based on data obtained from the JeanExtreme002/FlightRadarAPI, an unofficial API for FlightRadar24 written in Python.

CONTAINERS

Container Image Tag Internal Access Access from host at Credentials
Zookeeper wurstmeister/zookeeper latest zookeeper:2181
Kafka1 wurstmeister/kafka kafka1:9092
Kafka2 wurstmeister/kafka kafka2:9092
Kafka Manager hlebalbau/kafka-manager stable localhost:9000
Kafka Connect confluentinc/cp-kafka-connect-base
Jupyter custom [Dockerfile] localhost:8889 token=easy
MongoDB mongo 3.6 mongodb:27017 localhost:27019
Grafana custom [Dockerfile] localhost:3000 admin@password

KAFKA CONNECT

To enable Kafka messages sink in Mongo has been included the official MongoDB Kafka connector, providing both Sink and Source connectors, in the folder specified in the plugin_path.

GRAFANA

To enable MongoDB as a data source for Grafana, it has been necessary to create a custom image based on the official one, grafana/grafana-oss, that would integrate the connector. The connector used is the unofficial one from JamesOsgood/mongodb-grafana.

The datasource and the dashboard are provisioned as well.

Troubleshooting

MongoDB Sink Kafka Connector

If when curling the state of the connector one of the following error persist:

curl: (52) Empty reply from server
{
  "error_code": 404,
  "message": "No status found for connector mongodb-connector"
}

then, in the first case, you can try to:

> docker restart kafka_connect

Wait about 30 seconds and retry curling.

In the second case:

> curl -X POST -H "Content-Type: application/json" --data @./connect/properties/mongo_connector_configs.json http://localhost:8083/connectors

and retry curling.