This project demonstrates real-time data processing using Apache Kafka. The system generates data into CSV files, produces the data to Kafka, consumes the data from Kafka, and stores the aggregated data into MongoDB or PostgreSQL. The system could handle high throughput, processing up to 1,400,000 and more records in 30 seconds in last test.
- KafkaPyProject: Contains the Kafka producer and consumer logic.
- DatabasePyProject: Handles connections and operations with MongoDB and PostgreSQL.
- TransactionGeneratorPyProject: Generates data in real-time and saves it to CSV files.
pandas~=2.2.2
pymongo~=4.8.0
psycopg2~=2.9.9
confluent-kafka~=2.5.0
- Python 3.6 or above
- Apache Kafka
- MongoDB
- PostgreSQL (optional)
-
Clone the Repository
git clone https://github.com/your-repository-url.git cd your-repository-directory
-
Install Dependencies
pip install -r requirements.txt
-
Kafka Configuration
- Update Kafka broker configuration (
server.properties
):message.max.bytes=10485760 # 10 MB
- Restart Kafka broker.
- Update Kafka broker configuration (
-
Generate Data
python TransactionGeneratorPyProject/generate_transactions.py
-
Produce Data to Kafka
python KafkaPyProject/producer.py
-
Consume Data from Kafka
python KafkaPyProject/consumer.py
4Main project
python main.py
- Handling large data for producer and consumer.
- Connecting and processing data from CSV files in real-time.
- MongoDB
- PostgreSQL
Modify the consumer configuration to choose between MongoDB and PostgreSQL.