- Collect Transactions: Gather a dataset where each transaction is a list of items purchased together.
- Example transactions:
[ ['Albany White Bread', 'Sasko Butter', 'Clover Milk'], ['Sasko Whole Wheat Bread', 'Clover Milk'], ['Albany White Bread', 'Clover Milk', 'Freshpak Rooibos Tea'], ['Albany Rye Bread', 'Clover Butter', 'Parmalat Milk'], ['Sasko White Bread', 'Clover Milk', 'Nescafe Coffee'] ]
- Example transactions:
- Set Minimum Support Threshold: Determine the minimum support threshold (e.g., 0.2 or 20%).
- Identify Frequent Itemsets: Use the Apriori algorithm to find itemsets that appear in at least the minimum support threshold of transactions.
- Set Minimum Confidence Threshold: Determine the minimum confidence threshold (e.g., 0.6 or 60%).
- Create Association Rules: Generate rules from the frequent itemsets that meet the minimum confidence threshold.
Here’s how you can implement this using Python and the mlxtend
library:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
# Step 1: Data Preparation
transactions = [
['Albany White Bread', 'Sasko Butter', 'Clover Milk'],
['Sasko Whole Wheat Bread', 'Clover Milk'],
['Albany White Bread', 'Clover Milk', 'Freshpak Rooibos Tea'],
['Albany Rye Bread', 'Clover Butter', 'Parmalat Milk'],
['Sasko White Bread', 'Clover Milk', 'Nescafe Coffee']
]
# Convert transactions to a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)
# Step 2: Generate Frequent Itemsets
min_support = 0.2 # 20%
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)
# Step 3: Generate Association Rules
min_confidence = 0.6 # 60%
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)
# Display the rules
print(rules)
-
Data Preparation:
- Convert the list of transactions into a format suitable for analysis. Here, each transaction is encoded as a binary (one-hot) matrix.
-
Generate Frequent Itemsets:
- Use the Apriori algorithm to find itemsets that meet the minimum support threshold. An itemset is frequent if it appears in at least 20% of transactions.
-
Generate Association Rules:
- From the frequent itemsets, generate rules that have a confidence level above 60%. Confidence is defined as the likelihood that an item B is purchased when item A is purchased.
Assuming the rules
DataFrame is generated, it might look something like this:
antecedents consequents support confidence lift
0 (Albany White Bread) (Clover Milk) 0.4 0.75 1.25
1 (Sasko Whole Wheat Bread) (Clover Milk) 0.2 0.67 1.67
2 (Albany Rye Bread) (Clover Butter) 0.2 1.00 2.00
- Rule 1: If a customer buys Albany White Bread, there is a 75% chance they will also buy Clover Milk. This rule appears in 40% of the transactions.
- Rule 2: If a customer buys Sasko Whole Wheat Bread, there is a 67% chance they will also buy Clover Milk. This rule appears in 20% of the transactions.
- Rule 3: If a customer buys Albany Rye Bread, they always (100% of the time) buy Clover Butter. This rule appears in 20% of the transactions.