PPM-Data-Quality
This repository contains the source code of the paper Assessing Quality of Event Logs for Business Process Predictions.
Installation
Clone the repo and run the following commands
cd PPM-Data-Quality
conda create --name ppmdq python=3.7.
conda activate ppmdq
python setup.py install
pip install -r requirements.txt
Experiments
Run the following commands from PPM-Data-Quality folder
export CUDA_VISIBLE_DEVICES=0,1 #Specify GPU number(s)
I. Class Imbalance
Step 1: Basic Set-up
Set models and logs in 'default' mode (line 58: prediction_evaluation.py)
Set the same models and logs in 'CI' mode as well (line 51: prediction_evaluation.py)
Step 2: Get Default results
python prediction_evaluation.py --exp default --save_folder results_default
Step 3: Compute Class Imbalance Score on default features
python class_imbalance_target.py --folder results_default
Step 4: Compute Case-level results
python case_eval.py --folder results_default
Step 5: Get results after class imbalance remediations (undersampling)
python prediction_evaluation.py --exp CI --balancing_technique NM --save_folder results_nm
python prediction_evaluation.py --exp CI --balancing_technique CONN --save_folder results_conn
python prediction_evaluation.py --exp CI --balancing_technique NCR --save_folder results_ncr
Step 6: Compute Class Imbalance Score on undersampled features
python class_imbalance_target.py --folder results_nm
python class_imbalance_target.py --folder results_conn
python class_imbalance_target.py --folder results_ncr
Step 7: Compute Case-level results after undersampling
python case_eval.py --folder results_nm
python case_eval.py --folder results_conn
python case_eval.py --folder results_ncr
II. Class Overlap
Step 1: Compute Class Overlap (F1 & F2) Score on default features
python class_overlap.py --folder results_default
III. Missing Values
Step 1: Set models and pre-processed logs in 'MV' mode (line 58: prediction_evaluation.py)
Step 2: Get results on logs with filled missing values
python prediction_evaluation.py --exp MV --save_folder results_missing_values
IV. Outlier Filtering
Step 1: To only filter outliers
python prediction_evaluation.py --exp default --filter_percentage 10 --save_folder results_outliers
Step 2: To filter outliers first and then balance the dataset
python prediction_evaluation.py --exp CI --filter_percentage 10 --balancing_technique NM --save_folder results
NOTE: For each model, checkpoints are saved as .pth files in results_folder/models/run0/
(It saves the model's best during training)
The repository is primarily built upon the MPPN Repository -> https://github.com/joLahann/mppn