Real-Time Data Processing with Kafka

Overview

This project demonstrates real-time data processing using Apache Kafka. The system generates data into CSV files, produces the data to Kafka, consumes the data from Kafka, and stores the aggregated data into MongoDB or PostgreSQL. The system could handle high throughput, processing up to 1,400,000 and more records in 30 seconds in last test.

Project Structure

KafkaPyProject: Contains the Kafka producer and consumer logic.
DatabasePyProject: Handles connections and operations with MongoDB and PostgreSQL.
TransactionGeneratorPyProject: Generates data in real-time and saves it to CSV files.

Libraries Used

pandas~=2.2.2
pymongo~=4.8.0
psycopg2~=2.9.9
confluent-kafka~=2.5.0

Setup

Prerequisites

Python 3.6 or above
Apache Kafka
MongoDB
PostgreSQL (optional)

Installation

Clone the Repository

git clone https://github.com/your-repository-url.git
cd your-repository-directory

Install Dependencies
```
pip install -r requirements.txt
```
Kafka Configuration
- Update Kafka broker configuration (server.properties):
```
message.max.bytes=10485760  # 10 MB
```
- Restart Kafka broker.

Running the Project

Generate Data

python TransactionGeneratorPyProject/generate_transactions.py

Produce Data to Kafka
```
python KafkaPyProject/producer.py
```
Consume Data from Kafka
```
python KafkaPyProject/consumer.py
```

4Main project

python main.py

Problems Addressed

Handling large data for producer and consumer.
Connecting and processing data from CSV files in real-time.

Database Options

MongoDB
PostgreSQL

Switching Between Databases

Modify the consumer configuration to choose between MongoDB and PostgreSQL.

mehdizhd11/Real-Time-Transactions-Kafka