Algorithm: Apriori for Association Rule Mining

Step 1: Data Preparation

  1. Collect Transactions: Gather a dataset where each transaction is a list of items purchased together.
    • Example transactions:
      [
        ['Albany White Bread', 'Sasko Butter', 'Clover Milk'],
        ['Sasko Whole Wheat Bread', 'Clover Milk'],
        ['Albany White Bread', 'Clover Milk', 'Freshpak Rooibos Tea'],
        ['Albany Rye Bread', 'Clover Butter', 'Parmalat Milk'],
        ['Sasko White Bread', 'Clover Milk', 'Nescafe Coffee']
      ]
      

Step 2: Generate Frequent Itemsets

  1. Set Minimum Support Threshold: Determine the minimum support threshold (e.g., 0.2 or 20%).
  2. Identify Frequent Itemsets: Use the Apriori algorithm to find itemsets that appear in at least the minimum support threshold of transactions.

Step 3: Generate Association Rules

  1. Set Minimum Confidence Threshold: Determine the minimum confidence threshold (e.g., 0.6 or 60%).
  2. Create Association Rules: Generate rules from the frequent itemsets that meet the minimum confidence threshold.

Step 4: Implementation Example

Here’s how you can implement this using Python and the mlxtend library:

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Step 1: Data Preparation
transactions = [
    ['Albany White Bread', 'Sasko Butter', 'Clover Milk'],
    ['Sasko Whole Wheat Bread', 'Clover Milk'],
    ['Albany White Bread', 'Clover Milk', 'Freshpak Rooibos Tea'],
    ['Albany Rye Bread', 'Clover Butter', 'Parmalat Milk'],
    ['Sasko White Bread', 'Clover Milk', 'Nescafe Coffee']
]

# Convert transactions to a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Step 2: Generate Frequent Itemsets
min_support = 0.2  # 20%
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)

# Step 3: Generate Association Rules
min_confidence = 0.6  # 60%
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

# Display the rules
print(rules)

Explanation of the Steps

  1. Data Preparation:

    • Convert the list of transactions into a format suitable for analysis. Here, each transaction is encoded as a binary (one-hot) matrix.
  2. Generate Frequent Itemsets:

    • Use the Apriori algorithm to find itemsets that meet the minimum support threshold. An itemset is frequent if it appears in at least 20% of transactions.
  3. Generate Association Rules:

    • From the frequent itemsets, generate rules that have a confidence level above 60%. Confidence is defined as the likelihood that an item B is purchased when item A is purchased.

Sample Output

Assuming the rules DataFrame is generated, it might look something like this:

                       antecedents          consequents  support  confidence  lift
0          (Albany White Bread)        (Clover Milk)      0.4      0.75       1.25
1         (Sasko Whole Wheat Bread) (Clover Milk)      0.2      0.67       1.67
2            (Albany Rye Bread)        (Clover Butter)   0.2      1.00       2.00

Interpretation

  • Rule 1: If a customer buys Albany White Bread, there is a 75% chance they will also buy Clover Milk. This rule appears in 40% of the transactions.
  • Rule 2: If a customer buys Sasko Whole Wheat Bread, there is a 67% chance they will also buy Clover Milk. This rule appears in 20% of the transactions.
  • Rule 3: If a customer buys Albany Rye Bread, they always (100% of the time) buy Clover Butter. This rule appears in 20% of the transactions.