/CPFairRobust

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

CPFairRobust: Robustness in Fairness against Edge-level Perturbations in GNN-based Recommendation

CPFairRobust performs a poisoning-like attack that perturbs the user-item interactions that make a GNN-based recommender system favor a demographic group over another, disrupting the system fairness.
CPFairRobust learns a perturbation vector that modifies the adjacency matrix representing the training network.
CPFairRobust then needs to work on a slightly extended version of a recommender system in order to learn the perturbation vector. In our study we applied our framework on GCMC, LightGCN and NGCF, all provided in the Recbole library, from which CPFairRobust depend on for the data handling, the training and evaluation. Instead, the provided models are independent of the Recbole library.

Cite

This repository contains the source code of the paper Robustness in Fairness against Edge-level Perturbations in GNN-based Recommendation.

If you find this repostiory useful for your research or development cite our paper as

@inproceedings{conf/ecir/BorattoFFMM24,
  author       = {Ludovico Boratto and
                  Francesco Fabbri and
                  Gianni Fenu and
                  Mirko Marras and
                  Giacomo Medda},
  title        = {Robustness in Fairness against Edge-level Perturbations in GNN-based Recommendation},
  booktitle    = {Advances in Information Retrieval - 46th European Conference on {IR}
                  Research, {ECIR} 2024, Glasgow, Scotland, March 24-28, 2024},
  series       = {Lecture Notes in Computer Science},
  publisher    = {Springer},
  year         = {2024},
}

Requirements

Our framework was tested on Python 3.9 with the libraries listed in the requirements.txt that can be installed with:

pip install -r cpfair_robust/requirements.txt

Some dependencies related to PyTorch, e.g. torch-scatter, could be hard to retrieve directly from pip depending on the PyTorch and CUDA version you are using, so you should specify the PyTorch FTP link storing the right libraries versions. For instance to install the right version of torch-scatter for PyTorch 1.12.0 you should use the following command:

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+${CUDA}.html

where ${CUDA} should be replaced by either cpu, cu***, where *** represents the CUDA version, e.g. 116, 117.

Datasets DOI

The datasets used in our experiments are MovieLens 1M, Last.FM 1K, Insurance and can be downloaded from Zenodo. They should be placed in a folder named dataset in the project root folder, so next to the config and cpfair_robust folders, e.g.:

|-- config/*
|-- dataset |-- ml-1m
            |-- lastfm-1k
|-- cpfair_robust/*

Usage

The file main.py is the entry point for every step to execute in our pipeline.

1. Configuration

CPFairRobust scripts are based on similar Recbole config files that can be found in the -config folder. The structure is hierarchical, hence the file base_explainer.yaml -can be used to set the parameters shared for all the experiments and for each dataset specify -the necessary parameters. -For each dataset there is a config file for:

  • training: it is named after the dataset, e.g. ml-1m.yaml for MovieLens-1M, tafeng.yaml for Ta Feng
  • explaining: the suffix _explainer is added to training config filename, e.g. ml-1m_explainer.yaml for MovieLens 1M, tafeng_explainer.yaml for Ta Feng

The training config type parameters description can be found in the Recbole repository and website, except for this part:

eval_args:
    split: {'LRS': None}
    order: RO  # not relevant
    group_by: '-'
    mode: 'full'

where LRS (Load Ready Splits) is not a Recbole split type, but it is added in our modified_recbole_dataset.py to support custom data splits.

The description of each parameter in the explaining config type can be found in the relative files. In particular, for the explainer_policies:

  • exp_rec_data: "test" => the ground truth lables of the test set are used to measure the utility metrics
  • only_adv_group: "local" => the fairness level is measured w.r.t to each batch
  • gradient_deactivation_constraint: always False
  • perturb_adv_group: it does not have effect if gradient_deactivation_constraint is False For each fairness type the parameters to specify are:
  • CP
    • exp_metric: consumer_DP
    • metric_loss: ndcg # approximated loss to evaluate the fairness level
    • eval_metric: ndcg # metric to evaluate the fairness level
    • sensitive_attribute: age # or "gender"
  • CS
    • exp_metric: consumer_DP
    • metric_loss: softmax # approximated loss to evaluate the fairness level => the name is inaccurate, but the loss is the one explained in the paper
    • eval_metric: precision # metric to evaluate the fairness level
    • sensitive_attribute: age # or "gender"
  • PE
    • exp_metric: provider_DP
    • item_discriminative_attribute: exposure
  • PV
    • exp_metric: provider_DP
    • item_discriminative_attribute: visibility The edges addition or deletion is decided by the following parameter:
  • edge_additions: True # to add edges, False to delete them

2. Train Recommender System

The recommender systems need first to be trained:

python -m CPFairRobust.main --run train --model MODEL --dataset DATASET --config_file_list config/TRAINING_CONFIG.yaml

where MODEL should be one of [GCMC, LightGCN, NGCF], DATASET should match the folder of dataset, e.g. insurance, ml-1m, TRAINING_CONFIG should be a config file of the training type.

3. Train CPFairRobust explainer

python -m cpfair_robust.main --run explain --model MODEL --dataset DATASET --config_file_list config/TRAINING_CONFIG.yaml --explainer_config_file config/EXPLAINING_CONFIG.yaml --model_file saved/MODEL_FILE

where MODEL, DATASET, TRAINING_CONFIG were already explained above. EXPLAINING_CONFIG should be the config file relative to the same dataset.

CPFairRobust Output

CPFairRobust creates a folder cpfair_robust/experiments/dp_explanations/DATASET/MODEL/dpbg/LOSS_TYPE/SENSITIVE_ATTRIBUTE/epochs_EPOCHS/CONF_ID where SENSITIVE_ATTRIBUTE can be one of [gender, age] or not included if exp_metric is provider_DP, EPOCHS is the number of epochs used to train the attack CPFairRobust, CONF_ID is the configuration/run ID of the just run experiment. The folder contains the EXPLAINING_CONFIG file in yaml and pkl format used for the experiment, a file cf_data.pkl containing the information about the perturbed edges for each epoch, a file model_rec_test_preds.pkl containing the original recommendations on the rec (perturbation) set and test set, a file users_order.pkl containing the users ids in the order model_rec_test_preds.pkl are sorted, a file checkpoint.pth containing data used to resume the training if stopped earlier.

cf_data.pkl file contains a list of lists where each inner list has 5 values, relative to the perturbed edges at a certain epoch:

  1. CPFairRobust total loss
  2. CPFairRobust distance loss
  3. CPFairRobust fair loss
  4. fairness measured with the fair_metric (absolute difference of NDCG)
  5. the perturbed edges in a 2xN array, where the first row contains the user ids, the second the item ids, such that each one of the N columns is a perturbed edge
  6. epoch relative to the generated explanations

Plotting

The scripts inside the folder scripts can be used to plot the results used in the paper. They should be run from the root folder of this project. cpfair_robust_eval.py can be used as follows:

python scripts/cpfair_robust_eval.py --e cpfair_robust/experiments/dp_explanations/DATASET/MODEL/dpbg/LOSS_TYPE/SENSITIVE_ATTRIBUTE/epochs_EPOCHS/CONF_ID

where the argument --e stands for the path of a specific experiment. The other files starting with cpfair are used to generate specific plots of the paper.