Hi 👋, I'm Siddhesh Kankal

A passionate data engineer

Kafka-project-2-on-Exam-result-streaming-analytics-

Connect with me:

https://www.linkedin.com/in/siddhesh-kankal-bhavsar-20101996

Languages and Tools:

docker hadoop hive pandas python

Table of contents

General info

This project is about Student's Exams result streaming Analytics.

Technologies

Project is created with:

  • Confluent Kafka
  • PySpark
  • Python

Architecture

architecture

Detailed Explanation: In this project we are getting raw data excel (exams.csv) all students meta data along with their marks ,gender,group,reading score ,writing score, math score,etc.. so through producer code we are producing (or we can say sending data) to kafka topic as this is POC project we have only one node cluster kafka setup and on the other hand we are consuming the data through consumer code(or we can say subsribing the records) and we have write login as we are consolidating the records based on group wise in first consumer, and simultaneoulsy we can subscribe same topic through another consumer where records are segregated based on pass and fail students within group. and we are dumping the result set into seperate excel files.

Setup

To run this project, install it locally:

Then on the Spark shell run the below command from CLI

$ Python kafka_producer.py

and in another two seperate command prompt

$ Python kafka_consumer.py 
and in another one 
$ Python kafka_consumer2.py