/RS-BGExplainer

Explanation of recommendations in bipartite graph

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

GNNUERS: Explaining Unfairness in GNNs for Recommendation

GNNUERS generates explanations in the form of user-item interactions that make a GNN-based recommender system favor a demographic group over another.
GNNUERS learns a perturbation vector that modifies the adjacency matrix representing the training network. The edges modified by the perturbation vector are the explanations genereated by the framework.
GNNUERS then needs to work on a slight extended version of a recommender system in order to include the perturbation vector. In our study we applied our framework on GCMC, LightGCN and NGCF, all provided in the Recbole library, from which GNNUERS depend on for the data handling, the training and evaluation. Instead, the provided models are independent of the Recbole library.

Cite

This repository contains the source code of the paper GNNUERS: Fairness Explanation in GNNs for Recommendation via Counterfactual Reasoning.

If you find this repostiory useful for your research or development cite our paper as

@article{10.1145/3655631,
author = {Medda, Giacomo and Fabbri, Francesco and Marras, Mirko and Boratto, Ludovico and Fenu, Gianni},
title = {GNNUERS: Fairness Explanation in GNNs for Recommendation via Counterfactual Reasoning},
year = {2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
issn = {2157-6904},
url = {https://doi.org/10.1145/3655631},
doi = {10.1145/3655631},
abstract = {Nowadays, research into personalization has been focusing on explainability and fairness. Several approaches proposed in recent works are able to explain individual recommendations in a post-hoc manner or by explanation paths. However, explainability techniques applied to unfairness in recommendation have been limited to finding user/item features mostly related to biased recommendations. In this paper, we devised a novel algorithm that leverages counterfactuality methods to discover user unfairness explanations in the form of user-item interactions. In our counterfactual framework, interactions are represented as edges in a bipartite graph, with users and items as nodes. Our bipartite graph explainer perturbs the topological structure to find an altered version that minimizes the disparity in utility between the protected and unprotected demographic groups. Experiments on four real-world graphs coming from various domains showed that our method can systematically explain user unfairness on three state-of-the-art GNN-based recommendation models. Moreover, an empirical evaluation of the perturbed network uncovered relevant patterns that justify the nature of the unfairness discovered by the generated explanations. The source code and the preprocessed data sets are available at https://github.com/jackmedda/RS-BGExplainer.},
note = {Just Accepted},
journal = {ACM Trans. Intell. Syst. Technol.},
month = {apr},
keywords = {Recommender Systems, User Fairness, Explanation, Graph Neural Networks, Counterfactual Reasoning}
}

Requirements

Our framework was tested on Python 3.9. GNNUERS can be installed using the commands in install-env.sh by passing as argument the backend for pytorch, e.g., cpu, cu***, where *** represents the CUDA version, such as 116, 117. For instance, for CUDA 12.1:

./install-env.sh cu121

The file is configured to install pytorch (and the corresponding torch_geometric, torch_sparse, torch_scatter) based on the version 2.1.2. For other version the file install-env.sh must be modified accordingly. GNNUERS can also be directly installed through the file requirements.txt as follows:

pip install -r gnnuers/requirements.txt

requirements.txt contains the same command line arguments for pip that are included in the install-env.sh file. Some dependencies related to PyTorch, e.g., torch-scatter, could be hard to retrieve directly from pip depending on the PyTorch and CUDA version you are using, so you should specify the PyTorch FTP link storing the right libraries versions. For instance, to install the right version of torch-scatter for PyTorch 1.12.0 you should use the following command:

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu***, where *** represents the CUDA version, e.g., 116, 117.

NOTE!
The Recbole Dataset class does not support the usage of custom dataset splits like ours, and we cannot guarantee that, even if provided in new versions, it will match our modification. Hence, we implemented a modified Dataset on top of the Recbole one, which support the usage of custom data splits, and it is used to perform our experiments

The current versions of Recbole also contain a bug related to the NGCF model. A Dropout layer is instantiated inside the forward method, which makes the generation of new embeddings (after the perturbation) not reproducible even if eval is called on the model. To run our experiments, we fixed this issue by creating an extended NGCF version on top of the respective Recbole model.

Datasets

The datasets used in our experiments are MovieLens 1M, Last.FM 1K, Ta Feng, Insurance and can be downloaded from Zenodo. They should be placed in a folder named dataset in the project root folder, so next to the config and gnnuers folders, e.g.:

|-- config/*
|-- dataset |-- ml-1m
            |-- lastfm-1k
|-- gnnuers/*

Usage

The file main.py is the entry point for every step to execute our pipeline.

1. Configuration

GNNUERS scripts are based on similar Recbole config files that can be found in the -config folder. The structure is hierarchical, hence the file base_explainer.yaml -can be used to set the parameters shared for all the experiments and for each dataset specify -the necessary parameters. -For each dataset there is a config file for:

  • training: it is named after the dataset, e.g. ml-1m.yaml for MovieLens-1M, tafeng.yaml for Ta Feng
  • explaining: the suffix _explainer is added to training config filename, e.g. ml-1m_explainer.yaml for MovieLens 1M, tafeng_explainer.yaml for Ta Feng

The training config type parameters description can be found in the Recbole repository and website, except for this part:

eval_args:
    split: {'LRS': None}
    order: RO  # not relevant
    group_by: '-'
    mode: 'full'

where LRS (Load Ready Splits) is not a Recbole split type, but it is added in our modified_recbole_dataset.py to support custom data splits.

The description of each parameter in the explaining config type can be found in the relative files. In particular, for the explainer_policies:

  • force_removed_edges: it should be always True to reproduce our results, it represents the policy that prevents the restore of a previously deleted edge, such that the edges deletions follow a monotonic trend
  • edge_additions: True => edges are added, not removed
  • exp_rec_data: "test" => the ground truth lables of the test set are used to measure the approximated NDCG
  • only_adv_group: "local" => the global issue is measured w.r.t to each batch
  • perturb_adv_group: the group to be perturbed. False to perturb the disadvantaged group, used when adding nodes. True to perturb the advantaged group, used when removing nodes.
  • group_deletion_constraint: it is the Connected Nodes (CN) policy
  • random_perturbation: if True executes the baseline algorithm RND-P

2. Train Recommender System

The recommender systems need first to be trained:

python -m gnnuers.main --run train --model MODEL --dataset DATASET --config_file_list config/TRAINING_CONFIG.yaml

where MODEL should be one of [GCMC, LightGCN, NGCF], DATASET should match the folder of dataset, e.g. insurance, ml-1m, TRAINING_CONFIG should be a config file of the training type.

3. Train GNNUERS explainer

python -m gnnuers.main --run explain --model MODEL --dataset DATASET --config_file_list config/TRAINING_CONFIG.yaml --explainer_config_file config/EXPLAINING_CONFIG.yaml --model_file saved/MODEL_FILE

where MODEL, DATASET, TRAINING_CONFIG were already explained above. EXPLAINING_CONFIG should be the config file relative to the same dataset.

GNNUERS Output

GNNUERS creates a folder gnnuers/experiments/dp_explanations/DATASET/MODEL/dpbg/LOSS_TYPE/SENSITIVE_ATTRIBUTE/epochs_EPOCHS/CONF_ID where SENSITIVE_ATTRIBUTE can be one of [gender, age], EPOCHS is the number of epochs used to train GNNUERS, CONF_ID is the configuration/run ID of the just run experiment. The folder contains the EXPLAINING_CONFIG file in yaml and pkl format used for the experiment, a file cf_data.pkl containing the information about the perturbed edges for each epoch, a file model_rec_test_preds.pkl containing the original recommendations on the rec (perturbation) set and test set, a file users_order.pkl containing the users ids in the order model_rec_test_preds.pkl are sorted, a file checkpoint.pth containing data used to resume the training if stopped earlier.

cf_data.pkl file contains a list of lists where each inner list has 5 values, relative to the perturbed edges at a certain epoch:

  1. GNNUERS total loss
  2. GNNUERS distance loss
  3. GNNUERS fair loss
  4. fairness measured with the fair_metric (absolute difference of NDCG)
  5. the perturbed edges in a 2xN array, where the first row contains the user ids, the second the item ids, such that each one of the N columns is a perturbed edge
  6. epoch relative to the generated explanations

Plotting

The scripts inside the folder scripts can be used to plot the results used in the paper. They should be run from the root folder of this project. eval_info.py can be used as follows:

python scripts/eval_info.py --e biga/experiments/dp_explanations/DATASET/MODEL/dpbg/LOSS_TYPE/SENSITIVE_ATTRIBUTE/epochs_EPOCHS/CONF_ID

where the argument --e stands for the path of a specific experiment.