In this project, we will use a real-time flight tracking API, Apache Kafka, ElastichSearch and Kibana to create a real-time Flight-info data pipeline and track the flights in real-time. We will use a high-level architecture and corresponding configurations that will allow us to create this data pipeline. The end result will be a Kibana dashboard fetching real-time data from ElasticSearch.
Our project pipeline is as follows:
The following software should be installed on your machine in order to reproduice our work:
- Spark (spark-3.3.1-bin-hadoop2.7)
- Kafka (kafka_2.13-2.7.0)
- ElasticSearch (elasticsearch-7.14.2)
- Kibana (kibana-7.14.2)
- Python 3.9.6
We started by collecting in real-time Flight informations (Aircraft Registration Number,Aircraft Geo-Latitude,Aircraft Geo-Longitude,Aircraft elevation,Flight numbe...) and then we sent them to Kafka for analytics.
The data is ingested from the flight streaming data API and sent to a kafka topic. You need to run Kafka Server with Zookeeper and create a dedicated topic for data transport.
In Spark Streaming, Kafka consumer is created that periodically collect data in real time from the kafka topic and send them into an Elasticsearch index.
You need to enable and start Elasticsearch and run it to store the flight-info and their realtime information for further visualization purpose. You can navigate to http://localhost:9200 to check if it's up and running.
Kibana is a visualization tool that can explore the data stored in elasticsearch. In our project, instead of directly output the result, we used this visualization tool to visualize the streaming data in a real-time manner.You can navigate to http://localhost:5601 to check if it's up and running.
- Start Elasticsearch
sudo systemctl start elasticsearch
& sudo systemctl enable elasticsearch
- Start Kibana
sudo systemctl start kibana
& sudo systemctl enable kibana
- Start Zookeeper server by moving into the bin folder of Zookeeper installed directory by using:
./bin/zookeeper-server-start.sh ./config/zookeeper.properties
- Start Kafka server by moving into the bin folder of Kafka installed directory by using:
./bin/kafka-server-start.sh ./config/server.properties
- Run Kafka producer:
python3 ./real-time-flights-producer.py
- Run PySpark consumer with spark-submit:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.1,,org.elasticsearch:elasticsearch-spark-30_2.12:7.14.2 /home/sirine/Downloads/spark_consumer.py
- Open http://localhost:5601/ in your browser.
- Go to Management>Kibana>Saved Objects
- Import Real-Time-Flight-Tracking-Project-Dashbord.ndjson
- Open dashboard
-
A Pie displaying Aircraft head direction(dir) vs Aircraft Registration Number(reg_nmber) & Real-Time Flight Tracking count number:
-
Vertical bar of Aircraft horizontal speed (km) vs Aircraft elevation (meters) & Vertical bar of Aircraft horizontal speed (km) vs Aircraft Geo-Latitude:
-
Horizontal bar showing the different Aircraft horizontal speed (km) & Heat map of Aircraft elevation (meters) vs Aircraft head direction :
-
A Line that shows Aircraft horizontal speed (km) vs Aircraft Geo-Longitude:
-
A Map that geolocates in real time the different flights all over the world: