The current repository consists of two frequent itemset and association rule mining algorithms: Apriori Algorithm and FP Tree Algorithm.
The apriori algorithm has been implemented in Python3 and is contained in the apriori.py file. This file houses functions related to the apriori algorithm. The main functions are:
- freq_itemset_gen(): Generates frequent itemsets for a given transaction database for some given minimum support.
- maximal_itemset_gen(): Generates maximal frequent itemsets from the given frequent itemsets.
- closed_freq_itemset_gen(): Generates closed frequent itemsets from the given frequent itemsets.
- rule_generation: Generates the association rules for the frequent itemsets for some given minimum confidence.
The current implementation might ocassionally crash while generating association rules. This is due to the fact that an example itemset (i, j) is not similar to (j, i) and while performing comparision operations during rule generation, this will throw an exception.
The current implementation makes use of thread pools to try and decrease computation costs.
The current average performance metrics of the apriori algorithm are as follows:
Dataset: groceries.csv
Minimum support: 200
Minimum confidence: 50%
Generated data | Avg time taken |
---|---|
Frequent itemsets | 17.46 |
Maximal frequent itemsets | 0.000337 |
Closed frequent itemsets | 0.000339 |
Association rules | 0.000769 |
- insert(): Inserts values into the tree, is a wrapper which calls insert_recurse
- insert_recurse(): Recursively descends the tree and inserts the transaction into the tree
- print_nodes(): Utility for printing the supports, the parent, and the children of each node in the tree
- print_supports(): Utility for printing the support counts for each item
- reset_nodes(): Used by get_frequent_itemsets_with_suffix to extract the candidate itemsets
- generate_conditional_fp_tree(): Used to traverse the tree and generate the tree
- reset_values(): Resets the temp_val of each node
- get_frequent_itemsets_with_suffix(): This returns the frequent itemsets recursively
- fp_growth(): Wrapper that is used to extact the frequent itemsets
- process_dataset(): Used to preprocess the dataset, sort according to the supports and remove None values
-
The project uses:
- Python3
- Pandas
- OS
- Itertools
- Argparse
- Pprint
- Multiprocessing
Naren Surampudi [https://github.com/nsurampu]
Aditya Srikanth
Prateek Dasgupta