eclat

Python implementation of ECLAT algorithm for association rule mining.

This implementation mines rules , such that is an element in a transaction and is an element in hierarchy that a belongs to. This kind of rule is mined on the condition that there are transactions , where is an itemset belonging to an element in hierarchy .

Setup

$ conda env create -f environment.yml
$ conda activate eclat

Execution

Execute with default parameters:

$ python main.py

Parameters

Predefined Datasets

To execute for a predefined dataset:

$ python main.py --dataset=<dataset_id>

Possible dataset_id values:

0 - small debugging dataset,
1 - FruitHut dataset,
2 - Liquor11 dataset.

Custom Dataset

To execute for a custom dataset:

$ python main.py --data=<path/to/transactions.txt> --taxonomy=<path/to/taxonomy.txt>

File with taxonomy is optional. Rules based on hierarchy of items are not mined if taxonomy is not provided.

Example of transactions.txt file format:

1 2 3
1 2
1 3

Example of taxonomy.txt file format:

1,11
2,11
3,22
11,111
22,111

ECLAT parameters

An example of execution with ECLAT parametrization:

$ python main.py --min_sup=5 --min_conf=0.8 --min_len=3 --max_len=10

The options are:

min_sup - minimum support of the base of mined rules (type=int, default=1),
min_conf - minimum confidence of mined rules (type=float, default=0.5),
min_len - minimum length of mined rules (type=int, default=1),
max_len - maximum length of mined rules (type=int, default=None - not limited by default).

Unit Tests

To execute unit tests run the following command in the main directory:

$ python -m unittest test.test_eclat

Experiments

To run efficiency experiments:

$ python -m test.test_efficiency

piotrfratczak/eclat