/eclat

Python implementation of ECLAT algorithm for association rule mining.

Primary LanguageJupyter NotebookMIT LicenseMIT

eclat

Python implementation of ECLAT algorithm for association rule mining.

This implementation mines rules equation, such that equation is an element in a transaction and equation is an element in hierarchy that a belongs to. This kind of rule is mined on the condition that there are transactions equation , where equation is an itemset belonging to an element in hierarchy equation.

Setup

$ conda env create -f environment.yml
$ conda activate eclat

Execution

Execute with default parameters:

$ python main.py

Parameters

Predefined Datasets

To execute for a predefined dataset:

$ python main.py --dataset=<dataset_id>

Possible dataset_id values:

Custom Dataset

To execute for a custom dataset:

$ python main.py --data=<path/to/transactions.txt> --taxonomy=<path/to/taxonomy.txt>

File with taxonomy is optional. Rules based on hierarchy of items are not mined if taxonomy is not provided.

Example of transactions.txt file format:

1 2 3
1 2
1 3

Example of taxonomy.txt file format:

1,11
2,11
3,22
11,111
22,111

ECLAT parameters

An example of execution with ECLAT parametrization:

$ python main.py --min_sup=5 --min_conf=0.8 --min_len=3 --max_len=10

The options are:

  • min_sup - minimum support of the base of mined rules (type=int, default=1),
  • min_conf - minimum confidence of mined rules (type=float, default=0.5),
  • min_len - minimum length of mined rules (type=int, default=1),
  • max_len - maximum length of mined rules (type=int, default=None - not limited by default).

Unit Tests

To execute unit tests run the following command in the main directory:

$ python -m unittest test.test_eclat

Experiments

To run efficiency experiments:

$ python -m test.test_efficiency