This project aims to analyze historical data of client receipts to identify items frequently and rarely sold together. By examining transactional data, I uncover patterns that could inform product placement, marketing strategies, and inventory management. The analysis is implemented using Python and various data analysis libraries.
The primary goal is to derive actionable insights on which items are often purchased together and which are seldom paired, facilitating strategic decisions in product bundling, cross-selling opportunities, and enhancing the shopping experience.
- Pandas & Numpy: For data manipulation and analysis.
- Matplotlib & Seaborn: For data visualization.
- MLxtend: For frequent itemset and association rule mining.
- NetworkX: For creating and visualizing networks of item pairs.
- Objective: Identify item pairs with high support, lift, and leverage as indicators of frequent co-purchase, and item pairs with low values in these metrics to identify rare combinations.
- Key Metrics:
Where Support is defined as the proportion of transactions containing a certain item set:
- Lift = Support(A&B) / Support(A)*Support(B)
- Leverage = Support(A&B) - Support(A)*Support(B)
- Data Source: The data comes from the H&M Kaggle competition: https://www.kaggle.com/c/h-and-m-personalized-fashion-recommendations/overview
- Merged transaction data with article information to enrich the dataset.
- Grouped transactions by customer ID and date, treating purchases on the same day as a single transaction.
- Utilized the
TransactionEncoder
from mlxtend to convert transaction lists into a sparse matrix format, suitable for frequent itemset mining.
- Visualized the frequency of items sold across different sections to understand the distribution of transactions.
- Employed the FPGrowth algorithm to identify frequent itemsets with varying lengths.
- Applied the FPGrowth algorithm to the transactional data to find frequent itemsets with a minimal support threshold.
- Generated association rules from frequent itemsets to calculate metrics like lift and leverage for item pairs.
- Filtered the association rules to focus on item pairs, analyzing the most and least frequent pairs based on lift and leverage.
- Visualized the relationships between item pairs using network graphs to illustrate items often and rarely sold together.
Strategy: These sections have items that are frequently sold together, but not the most popular items.
- Bundling Products - Bundling these sections could help raise interest/sales.
- Optimize Store Layout - Place these sections close to each other.
Strategy: These sections have items are frequently sold together, and are also popular sections.
- Optimize Store Layout: Place these sections close to each other.
- Discounting One Item: Discount one of these sections, but not the other. Customers are probably willing to pay full price to ‘complete’ the pair
Strategy: Items least frequently purchased together may be placed far away in the store.
- Identified the most frequently bought item pairs, highlighting opportunities for cross-selling and product placement optimization.
- Discovered item pairs that are rarely or never bought together, indicating potential areas for marketing interventions or inventory adjustments.
- Visualized networks of item associations, providing a clear overview of product interrelationships and shopper behavior patterns.
This analysis sheds light on shopper behaviors, revealing patterns in item co-purchases that can drive strategic business decisions. Future work could extend this analysis by incorporating more granular data, such as item categories or customer demographics, to further refine recommendations for product bundling and store layout optimization.