Group:
- Salma Seddik
- Naima Attia
- Med Mongi Saidane
This project consists of:
- A streaming application, collecting data from public APIs & publishing them to a Kafka Queue.
- Same service as above, however working as a TCP streamer instead of Kafka.
- A spark streaming pipeline, taking data from Kafka & transforming it, making some calculations then storing data to both Hadoop File System & to MongoDB.
- A spark batch processing pipeline, taking a csv file & transforming the data in it, writing to HDFS & to MongoDB.
- A grafana service, reading data from MongoDB & displaying multiple graphs. Some of the graphs are real-time data.
- Python (Streaming services, Spark streaming & batch processing)
- Grafana (Data display)
- MongoDB
- Hadoop FS
- Spark
- Kafka
- Docker & Docker compose
To run everything, use up.sh To reset/shutdown, use reset.sh