These source codes are for GIM-RL, an algorithm for itemset mining using a reinforcement learning agent.
This algorithm supports the tasks of High Utility Itemset Mining, Frequent Itemset Mining, and Association Rule Mining, and can mine itemsets by determining the dataset to be mined and the threshold value for each task.
This algorithm offers a unified framework to extract various types of itemsets.
One can easily extend the source codes to extract a different type of itemsets by defining his/her own reward.
It is recommended to build the source codes on an environment using docker which has the following dependencies.
- GPUs that support CUDA and nvidia-drivers
- docker ( >= 19.03)
- nvidia-docker
- make
Please follow the steps below to set up.
-
Build a docker image, create a container, and attach it to the container.
make up
-
Download the dataset
./download_data.sh
-
Start MLflow using tmux or background execution
mlflow ui --host 0.0.0.0
After starting MLflow, access localhost:5000
with a browser and check the execution record from there.
Use the exit
command to exit from the container, and the make up
command to reattach to the container.
Hydra is used to manage the parameter, and the results are aggregated in MLflow and can be checked from a browser.
Hydra is used to select a dataset and agent to be used, and to configure various parlermeters.
Detailed parameters are described in config.yaml. In addition, by installing Joblib Launcher plugin, parallel execution is possible using the -m
parameter.
Extract itemsets formed by highly profitable items from a dataset.
You can use Chess, Mushroom, Accidents_10%, and Connect as datasets.
Run: hui_train.py
python hui_train.py dataset/hui=chess,mushroom,accidents10per,connect -m
Extract itemsets consisting of frequently co-occurring items from a dataset.
Chess, Mushroom, Pumsb, and Connect can be used as datasets.
Run: fp_train.py
python fp_train.py dataset/fp=chess,mushroom,pumsb,connect agent.lambda_end=0.6 agent.network=simple interaction.episodes=1000 -m
Extract rules consisting of correlated items from a dataset.
Chess, Mushroom, Pumsb, and Connect can be used as datasets.
Run: ar_train.py
python ar_train.py dataset/ar=chess,mushroom,pumsb,connect interaction.episodes=1000 -m
Apply an agent trained on a source dataset to another target dataset. You can use the dataset available for each mining task. The first 60% and the remaining 40% of the dataset are used as the source and target partitions, respectively.
Run: transfer_hui_train.py
python transfer_hui_train.py dataset/transfer=hui_chess,hui_mushroom,hui_accidents10per,hui_connect agent.test_lambda_start=0.5 -m
Run: transfer_fp_train.py
python transfer_fp_train.py dataset/transfer=fp_chess,fp_mushroom,fp_pumsb,fp_connect agent.lambda_end=0.6 agent.test_lambda_end=0.6 agent.network=simple -m
Run: transfer_ar_train.py
python transfer_ar_train.py dataset/transfer=ar_chess,ar_mushroom,ar_pumsb,ar_connect agent.lambda_start=0.5 -m
If you get an error when installing the libraries in the setup or cannot create the container, you can delete the pip entry in dev_env.yml and run it again. The dependencies without using docker are the following
It is also recommended to run these source codes on a GPU, but if a GPU is unavailable, it is also possible to not use a GPU by changing the base image in the Dockerfile to the one that does not use CUDA, and by removing the --gpus all
parameter in the Makefile.
If you found this code useful, please cite the following paper:
@ARTICLE{9676615,
author={Fujioka, Kazuma and Shirahama, Kimiaki},
journal={IEEE Access},
title={Generic Itemset Mining Based on Reinforcement Learning},
year={2022},
volume={10},
number={},
pages={5824-5841},
doi={10.1109/ACCESS.2022.3141806}
}