-
🔭 I’m currently working on Kafka-project-2-on-Exam-result-streaming-analytics-
-
👨💻 All of my projects are available at https://github.com/siddheshkankal
-
📝 I regularly write articles on [Data engineering tech stack](Data engineering tech stack)
-
📫 How to reach me dksidd96@gmail.com
This project is about Student's Exams result streaming Analytics.
Project is created with:
- Confluent Kafka
- PySpark
- Python
Detailed Explanation: In this project we are getting raw data excel (exams.csv) all students meta data along with their marks ,gender,group,reading score ,writing score, math score,etc.. so through producer code we are producing (or we can say sending data) to kafka topic as this is POC project we have only one node cluster kafka setup and on the other hand we are consuming the data through consumer code(or we can say subsribing the records) and we have write login as we are consolidating the records based on group wise in first consumer, and simultaneoulsy we can subscribe same topic through another consumer where records are segregated based on pass and fail students within group. and we are dumping the result set into seperate excel files.
To run this project, install it locally:
Then on the Spark shell run the below command from CLI
$ Python kafka_producer.py
and in another two seperate command prompt
$ Python kafka_consumer.py
and in another one
$ Python kafka_consumer2.py