Apriori_ITDAA4-12: A repository from dieumerci

Algorithm: Apriori for Association Rule Mining

Step 1: Data Preparation

Collect Transactions: Gather a dataset where each transaction is a list of items purchased together.

Example transactions:

[
  ['Albany White Bread', 'Sasko Butter', 'Clover Milk'],
  ['Sasko Whole Wheat Bread', 'Clover Milk'],
  ['Albany White Bread', 'Clover Milk', 'Freshpak Rooibos Tea'],
  ['Albany Rye Bread', 'Clover Butter', 'Parmalat Milk'],
  ['Sasko White Bread', 'Clover Milk', 'Nescafe Coffee']
]

Step 2: Generate Frequent Itemsets

Set Minimum Support Threshold: Determine the minimum support threshold (e.g., 0.2 or 20%).
Identify Frequent Itemsets: Use the Apriori algorithm to find itemsets that appear in at least the minimum support threshold of transactions.

Step 3: Generate Association Rules

Set Minimum Confidence Threshold: Determine the minimum confidence threshold (e.g., 0.6 or 60%).
Create Association Rules: Generate rules from the frequent itemsets that meet the minimum confidence threshold.

Step 4: Implementation Example

Here’s how you can implement this using Python and the mlxtend library:

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Step 1: Data Preparation
transactions = [
    ['Albany White Bread', 'Sasko Butter', 'Clover Milk'],
    ['Sasko Whole Wheat Bread', 'Clover Milk'],
    ['Albany White Bread', 'Clover Milk', 'Freshpak Rooibos Tea'],
    ['Albany Rye Bread', 'Clover Butter', 'Parmalat Milk'],
    ['Sasko White Bread', 'Clover Milk', 'Nescafe Coffee']
]

# Convert transactions to a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

# Step 2: Generate Frequent Itemsets
min_support = 0.2  # 20%
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)

# Step 3: Generate Association Rules
min_confidence = 0.6  # 60%
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=min_confidence)

# Display the rules
print(rules)

Explanation of the Steps

Data Preparation:
- Convert the list of transactions into a format suitable for analysis. Here, each transaction is encoded as a binary (one-hot) matrix.
Generate Frequent Itemsets:
- Use the Apriori algorithm to find itemsets that meet the minimum support threshold. An itemset is frequent if it appears in at least 20% of transactions.
Generate Association Rules:
- From the frequent itemsets, generate rules that have a confidence level above 60%. Confidence is defined as the likelihood that an item B is purchased when item A is purchased.

Sample Output

Assuming the rules DataFrame is generated, it might look something like this:

                       antecedents          consequents  support  confidence  lift
0          (Albany White Bread)        (Clover Milk)      0.4      0.75       1.25
1         (Sasko Whole Wheat Bread) (Clover Milk)      0.2      0.67       1.67
2            (Albany Rye Bread)        (Clover Butter)   0.2      1.00       2.00

Interpretation

Rule 1: If a customer buys Albany White Bread, there is a 75% chance they will also buy Clover Milk. This rule appears in 40% of the transactions.
Rule 2: If a customer buys Sasko Whole Wheat Bread, there is a 67% chance they will also buy Clover Milk. This rule appears in 20% of the transactions.
Rule 3: If a customer buys Albany Rye Bread, they always (100% of the time) buy Clover Butter. This rule appears in 20% of the transactions.

dieumerci/Apriori_ITDAA4-12