Big Data Project For University Year GL4 - INSAT

Group:

Description

This project consists of:

A streaming application, collecting data from public APIs & publishing them to a Kafka Queue.
Same service as above, however working as a TCP streamer instead of Kafka.
A spark streaming pipeline, taking data from Kafka & transforming it, making some calculations then storing data to both Hadoop File System & to MongoDB.
A spark batch processing pipeline, taking a csv file & transforming the data in it, writing to HDFS & to MongoDB.
A grafana service, reading data from MongoDB & displaying multiple graphs. Some of the graphs are real-time data.

To run everything, use up.sh To reset/shutdown, use reset.sh

Spark applications are inside tools folder.
To get a shell on Hadoop containers, you can find bash scripts to do so quickly in sh folder.
Last, other helper bash scripts can be found inside scripts folder.