Market Basket Analysis, Apriori Algorithm and Asssociation

A Market Basket Analysis project

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
AssociationRules
RFM Analysis
Scenario Analysis - Bundle recommendations
Roadmap
Contributing
License
Contact
Acknowledgments

Craft Tea Fox - Craft Matcha made better

This analysis is a practical implementation of the Apriori Algorithm via Python.

Primer on Apriori Algorithm & Association Rules

Apriori algorithms is a data mining algorithm used for mining frequent itemsets and relevant association rules. It is devised to operate on a database that contain transactions -like, items bought by a customer in a store.

An itemset can be considered frequent if it meets a user-specified support threshold. For example, if the support threshold is set to 0.5(50%), a frequent itemset is a set of items that are bought/purchased together in atleast 50% of all transactions.

Association rules are a set of rules derived from a database, that can help determining relationship among variables in a large transactional database.

For example, let I ={i(1),i(2)...,i(m)} be a set of m attributes called items, and T={t(1),t(2),...,t(n)} be the set of transactions. Every transaction t(i) in T has a unique transaction ID, and it contains a subset of itemsets in I.

Association rules are usually written as i(j) -> i(k). This means that there is a strong relationship between the purchase of item i(j) and item i(k). Both these items were purchased together in the same transaction.

In the above example, i(j) is the antecedent and i(k) is the consequent.

Please note that both antecedents and consequents can have multiple items. For example, {Diaper,Gum} -> {Beer, Chips} is also valid.

Since multiplie rules are possible even from a very small database, i-order to select the most relevant ones, we use constraints on various measures of interest. The most important measures are discussed below. They are:

1. Support : * The support of an itemset X, supp(X) is the proportion of transaction in the database in which the item X appears. It signifies the popularity of an itemset.
supp(X) = (Number of transactions in which X appears)/(Total number of transactions)

We can identify itemsets that have support values beyond this threshold as significant itemsets.

1. Confidence :* Confidence of a rule signifies the likelihood of item Y being purchased when item X is purchased.

Thus, conf(X -> Y) = supp(X U Y) / supp( X )

If conf (X -> Y) is 75%, it implies that, for 75% of transactions containing X & Y, this rule is correct. It is more like a conditional probability, P(Y|X), that the probability of finding itemset Y in transactions fiven that the transaction already contains itemset X.

1. Lift :* Lift explains the the likelihood of the itemset Y being purchased when itemset X is already purchased, while taking into account the popularity of Y.r>

Thus, lift (X -> Y) = supp (X U Y)/( supp(X) supp (Y) )

If the value of lift is greater than 1, it means that the itemset Y is likely to be bought with itemset X, while a value less than 1 implies that the itemset Y is unlikely to be bought if the itemset X is bought.

(back to top)

Built With

Major frameworks/libraries used to bootstrap project.

(back to top)

Getting Started

Instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Prerequisites

pip
```
pip install -r requirements
```

Installation

Installing and setting up your app.

Run Jupyter notebook on Sagemaker at https://bcg-rise-bda.awsapps.com/start#/

Clone the repo

git clone https://github.com/JohnTan38/Best-README.git

Install packages
```
pip install mlxtend
```

Import libraries

from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

(back to top)

Association Rules & RFM Analysis (Recency, Frequency, Monetary)

Data Preprocessing and transformation - TransactionEncoder class from the MLXtend library

To find unique items - flatten the dataframe and convert into a set. The transformation removes any duplicate items
Fit the object of the class on the list and convert to dataframe.
for every item in a transaction, append 1 if purchased and 0 otherwise.

  # fitting the list and converting the transactions to true and false
  encoder = TransactionEncoder()
  transactions = encoder.fit(matcha_list).transform(matcha_list)
  
  # converting the transactions array to a datafrmae
  df = pd.DataFrame(transactions, columns=encoder.columns_)

Market Basket Analysis

Market Basket Analysis is a data mining tool used by retailers to increase sales by better understanding customer purchasing patterns. Purchase history and items bought together are analyzed to reveal product groupings, as well as products that are likely to e purchased together.

Association Rules

Association Analysis looks for relationships in large datasets. These relationships can take 2 forms: frequent item sets or association rules. Frequent item sets are a collection of items that frequently occur together. Association rules suggest that a strong relationship exists between two items

Frequently bought together

> Matcha latte and Hojicha latte pair with high level of support and lift. Lift > 1 indicates that higher sales of antecedents lead to higher sales of consequents

Association Rule - Awakening Matcha Whisk set & Matcha Starter kit

> Awakening Matcha Whisk set and Matcha Starter kit bundle with high level of support and lift.

Association Rule - Min Support 3% and Lift > 2

Closely associated products with minimum support of 3% and lift greater than 2. Customers who add item to cart could have closely associated items suggested to them before checkout. Different permutations and threholds of Support and Lift return differennt association rules.

(back to top)

RFM Analysis

Customers recency, frequency & monetary (transaction values) are analyzed and K Means clustering used to group customers into distinct segments
.

Customer segmentation fine-tuned with detailed analysis and RFM segments identified. For example, top customers who buy frequently and with high ticket values in RFM segment '144' could be offered bundle of 'Awakening Matcha Whisk set' with 'Ceremonial Uji Matcha Powder'.

Association Rule + RFM - Opportunities for targeted cross-selling

Customers' RFM segments and closely associated products provide opportuniites for targeted cross selling . Customers of RFM segment '444' who bought 'Awakening Matcha Whisk Set' could have 'Matcha Starter Kit' recommended.

Sales Trends -

Consistent all year sales except for last quarter of 2021.

Matcha Starter Kit enjoys high support and lift. Sales campaign to smooth out sales trend during 2nd and 3rd quarters. Gross profit would be increased with a successful campaign.

(back to top)

Scenario Analysis - Bundle recommendations

Potential uplift of 35% gross sales of Awakening Matcha Whisk Set.

Potential uplift of 52% gross sales of Ceremonial Uji Matcha Powder.

Potential uplift of 18% gross sales of Barista Uji Matcha Powder.

Pros and Cons of Apriori Algorithm

Easy to understand
Suitable for large itemsets

Computationally expensie if there are many association rules
Calculating Support is expensive as algorithm goes through entire dataset

_For more examples, please refer to the Documentation

(back to top)

Roadmap

Data collection - customers' demographic profile
Sesarch Engine Optimization (SEO) & click through rates (CTR)
Google Analytics 360 - data driven attribution
Fine tune threshold values for Support and Lift
Multi-language Support
- Chinese
- Bahasa Indeonesia

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Contributors ✨

Contributing

This project follows the all-contributors specification. Contributions of any kind welcome!

Support:

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

vieming@gmail.com

Project Link: https://github.com/JohnTan38/Best-README

(back to top)

Acknowledgments

(back to top)