The Amazon Metadata dataset is a collection of product information stored in JSON format. It includes various attributes such as product ID, title, features, description, price, image URLs, related products, sales rank, brand, categories, and technical details.
You can download the Amazon Metadata dataset from here.
-
Python Libraries:
-
Softwares:
- Apache Kafka: Download and setup instructions here.
- Python: Download and installation guide here.
- Sampling and Preprocessing the Dataset
- Setting up Streaming Pipeline
- Implementing Frequent Itemset Mining Algorithms
- Integrating with Database
- Bash Script for Enhanced Project Execution
-
Sampling and Preprocessing:
- Download the Amazon Metadata dataset.
- Execute
preprocess.ipynb
to sample and preprocess the dataset.
-
Streaming Pipeline Setup:
- Develop a producer application (
producer.py
) to stream preprocessed data. - Create consumer applications (
apriori_consumer.py
,pcy_consumer.py
,custom_consumer.py
) to subscribe to the producer's data stream.
- Develop a producer application (
-
Frequent Itemset Mining:
- Implement the Apriori algorithm in
apriori_consumer.py
. - Implement the PCY algorithm in
pcy_consumer.py
. - Implement custom analysis in
custom_consumer.py
.
- Implement the Apriori algorithm in
-
Database Integration:
- Connect each consumer to a database and store the results.
-
Bash Script:
- Utilize the provided bash script to initialize Kafka components and run the producer and consumers seamlessly.
- Efficient preprocessing techniques to handle large datasets.
- Real-time streaming pipeline for immediate insights.
- Implementation of popular frequent itemset mining algorithms.
- Flexible database integration for data persistence.
- Bash script automates project execution, enhancing usability.
- Clone the repository.
- Download the Amazon Metadata dataset.
- Execute the preprocessing script to sample and preprocess the dataset.
- Run the provided bash script to initialize Kafka components and execute the producer and consumers.
- Analyze the generated frequent itemsets and association rules.
Meet the dedicated individuals who contributed to this project: