- Description
- Technologies Used
- Architecture Overview
- Setup and Run Instructions
- Monitoring and Logs
- References
- Troubleshooting
- Contributing
- License
The Wikimedia Data Processing project is designed to handle and process large volumes of real-time streaming data from Wikimedia. The project leverages Kafka for efficient and scalable data ingestion and processing and utilizes MySQL for data storage, enabling analytical insights.
- Real-Time Data Ingestion: Utilizes a Kafka producer to stream real-time data from Wikimedia.
- Data Processing: Consumes and processes real-time data via a Kafka consumer.
- Data Storage: Persists processed data into a MySQL database.
- Scalability: Built with Spring Boot microservices architecture for scalability and maintainability.
For accessing the real-time stream data from Wikimedia, visit: Wikimedia Stream
- Java: Version 11
- Spring Boot: Version 2.5.12
- Apache Kafka: Distributed streaming platform
- Apache Maven: Version 3.8.4 for build automation
- MySQL: Relational Database Management System
- Docker: Version 27.0.3 for containerization
- IntelliJ IDEA: 2024.1.4 (Community Edition)
- Docker Desktop: Version 4.31.1
The project follows a microservices architecture and comprises two main components:
- Wikimedia Data Producer: A Kafka producer that reads real-time data from Wikimedia and publishes it to a Kafka topic.
- Wikimedia Data Consumer: A Kafka consumer that reads data from the Kafka topic and stores it in a MySQL database.
- Install Docker and Docker Compose
- Install Java 11
- Install Maven 3.8.4
-
Clone the Repository:
git clone https://github.com/tienhuynh-tn/Wikimedia-data-processing.git cd Wikimedia-data-processing
-
Build the Project:
mvn clean install
-
Run Docker Services:
- Start Kafka UI:
docker compose -f ./docker/kafka-ui.yml up -d
- Start MySQL Database:
docker compose -f ./docker/mysql.yml up -d
- Start Kafka UI:
-
Run Services in IntelliJ:
- Open IntelliJ, import the project, and run the main classes for both producer and consumer modules.
-
Edit
application.yml
Configuration:- Update the configuration to match the Docker environment (comment/uncomment as necessary).
-
Navigate to the Docker Directory:
cd ./docker/
-
Build Docker Images:
docker-compose build --no-cache
-
Start All Services:
docker-compose up -d
-
View Application Logs:
docker logs -f wikimedia-data-processing
- Kafka UI: Access Kafka UI at
http://localhost:8080
to monitor Kafka topics and messages. - Application Logs: Use
docker logs -f <container_name>
to view real-time logs for each service.
- Kafka Broker Not Starting: Ensure the
zookeeper
service is running before starting the Kafka broker. - Connection Refused to MySQL: Check the MySQL host and port configurations, and verify Docker network settings.
- Kafka Connection Issues: When running with Docker Compose, you may encounter
Connection to node -1 (localhost/127.0.0.1:9092) could not be established
errors initially. This is expected; the project should stabilize after a few minutes.
Contributions are welcome! Please submit a pull request or open an issue to get started.
© 2024 tienhuynh-tn. This project is licensed under the MIT License.