/Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Primary LanguagePythonMIT LicenseMIT

Data Mining: Apriori and Eclat Frequent Itemset Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Implementaions

  • Apriori algorithm
  • Eclat algorithm (recursive method w/ GPU acceleration support)
  • Eclat algorithm (iterative method)

Requirements

  • < Python 3.6+ >
  • < NVIDIA CUDA 9.0 > (Optional)
  • < Pycuda 2018.1.1 > (Optional)
  • < g++ [gcc version 6.4.0 (GCC)] > (Optional)

Environment Setup

sudo pip3 install pycuda
  • Refer here for "CUDA unsupported GNU version" problem, or follow the following steps:
1. sudo apt-get install gcc-6
2. sudo apt-get install g++-6
3. sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 10
4. sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 10

Datasets:

  • ./data/data.txt: suggested min support range: [0.6 0.02]
  • ./data/data2.txt: a harder dataset, only eclat can find results in reasonable time. Suggested min support range: [0.1 0.0002]

Usage

  • To run the Apriori / Cclat algorithm with defaul settings:
python3 runner.py apriori
python3 runner.py eclat
  • Other arguments can be given by:
python3 runner.py [mode] --min_support 0.6 --input_path ./data/data.txt --output_path ./data/output.txt
  • To run Eclat with GPU acceleration (Suggested dataset: data2.txt):
python3 runner.py eclat --min_support 0.02 --input_path ./data/data2.txt --use_CUDA
  • To plot run time v.s. different experiment values:
python runner.py [mode] --plot_support
python runner.py [mode] --plot_support_gpu --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --compare_gpu --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --plot_thread --input_path ./data/data2.txt --use_CUDA
python runner.py [mode] --plot_block --input_path ./data/data2.txt --use_CUDA
  • To test with toy data:
python runner.py [mode] --toy_data
  • To run the eclat algorithm with the iterative method:
python runner.py [mode] --iterative

Apriori minimum support v.s. run time plot

Eclat minimum support v.s. run time plot

Eclat minimum support v.s. run time plot (data2.txt w/ GPU version)

Eclat w/ GPU and w/o GPU comparison plot (data2.txt w/ GPU version)

Reference