- PyGDebias: Attributed Network Datasets and Fairness-Aware Graph Mining Algorithms
Graph mining algorithms have been playing a critical role in a plethora of areas. However, most of the existing ones lack fairness consideration. Consequently, they may deliver biased predictions toward certain demographic subgroups or individuals. To better understand existing debiasing techniques and facilitate the deployment of fairness-aware graph mining algorithms, we developed this library PyGDebias featured for built-in datasets and implementations of popular fairness-aware graph mining algorithms for the study of algorithmic fairness on graphs.
Specifically, this open-source library PyGDebias aims to provide a systematic schema to load datasets and compare different debiasing techniques for graph learning algorithms. Specifically, 26 graph datasets (including 24 commonly used ones and two newly constructed ones, AMiner-L and AMiner-S) are collected, and 13 algorithms are implemented in this library.
Our survey paper "Fairness in Graph Mining: A Survey" has been accepted by TKDE and released on arxiv [PDF]. If you find PyGDebias helpful, we would appreciate citations to the following paper:
@article{dong2023fairness,
title={Fairness in graph mining: A survey},
author={Dong, Yushun and Ma, Jing and Wang, Song and Chen, Chen and Li, Jundong},
journal={IEEE Transactions on Knowledge and Data Engineering},
year={2023},
publisher={IEEE}
}
or:
Dong, Y., Ma, J., Wang, S., Chen, C., & Li, J. (2023). Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering.
We summarize the basic API of the implemented graph mining algorithms as below.
-
fit(): Execute the training process for the initiated graph mining algorithm.
-
predict(): Evaluate the trained graph mining algorithm on the test set.
Here, we provide guidelines for setting up the library. There are basically 2 ways to install it
# Set up the environment
conda create -n pygdebias python=3.8
conda activate pygdebias
# Installation
git clone https://github.com/yushundong/PyGDebias.git
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install PyGDebias/ -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html
conda create -n pygdebias python=3.8
conda activate pygdebias
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install pygdebias==1.0.1 -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html
from pygdebias.debiasing import GUIDE
from pygdebias.datasets import Nba
# Available choices: 'Credit', 'German', 'Facebook', 'Pokec_z', 'Pokec_n', 'Nba', 'Twitter', 'Google', 'LCC', 'LCC_small', 'Cora', 'Citeseer', 'Amazon', 'Yelp', 'Epinion', 'Ciao', 'Dblp', 'Filmtrust', 'Lastfm', 'Ml-100k', 'Ml-1m', 'Ml-20m', 'Oklahoma', 'UNC', 'Bail'.
nba = Nba()
adj, features, idx_train, idx_val, idx_val, idx_test, labels, sens = nba.adj(), nba.features(), nba.idx_train(), nba.idx_val(), nba.idx_test(), nba.labels(), nba.sens()
# Initiate the model (with default parameters).
model = GUIDE()
# Train the model.
model.fit(adj, features, idx_train, idx_val, idx_test, labels, sens)
# Evaluate the model.
model.predict()
26 graph datasets are collected in this library, including 24 commonly used ones and two newly constructed ones (Aminer-L and Aminer-S). We provide their descriptions as follows.
- Facebook: Facebook has anonymized features for each node representative of various attributes of a person’s Facebook profile.
- Pokec_z & Pokec_n: The two datasets are sampled from Pokec by province. Pokec contains anonymized data of the whole social network in 2012, in which the profiles contain gender, age, hobbies, interest, education, working field and etc.
- NBA: This dataset is an extension of a Kaggle dataset containing around 400 NBA basketball players. Features include the performance statistics of players in the 2016-2017 season and other various information e.g., nationality, age, and salary.
- German: The dataset is a credit graph which has 1,000 nodes representing clients in a German bank that are connected based on the similarity of their credit accounts.
- Credit: Credit contains individuals which are connected based on the similarity of their spending and payment patterns.
- Recidivism: Recidivism has 18,876 nodes representing defendants who got released on bail at the U.S state courts during 1990-2009, where defendants are connected based on the similarity of past criminal records and demographics.
- Google+: The dataset is created by data collected from a social networking platform developed by Google.
- AMiner-L & AMiner-S: AMiner-L and AMiner-S are co-author networks with sensitive attributes ready for algorithmic fairness study on graphs. Specifically, we construct AMiner-L (titled 'LCC' in our built-in data loader) and AMiner-S (titled 'LCC_small' in our built-in data loader) based on the AMiner network [1]. To construct the two datasets, we first filter out the nodes in the AMiner network with incomplete information. Then we adopt two different approaches to sample a connected network from the filtered dataset: AMiner-L is a subgraph sampled with random walk, while AMiner-S is the largest connected component of the filtered AMiner network. In both datasets, nodes represent the researchers in different fields, and edges denote the co-authorship between researchers. The sensitive attribute is the continent of the affiliation each researcher belongs to, and the labels of nodes represent the primary research field of each researcher.
- Cora: The Cora dataset is a collection of computer science research papers categorized into different topics.
- Citeseer: The CiteSeer dataset is a digital library of scientific articles, primarily focused on computer science, featuring citation relationships and widely utilized for research in information retrieval and citation network analysis.
- Pubmed: Pubmed contains citation networks that consider articles as nodes and descriptions of articles as their nodal attributes.
- Amazon: The dataset constitutes a respective knowledge graph with entities and relations crawled from Amazon. The collection consists of four different domains: CDs and Vinyl, Clothing, Cell Phones, and Beauty.
- Yelp: The Yelp dataset is a large collection of user-generated reviews and associated ratings for businesses, encompassing various industries and geographical locations.
- Ciao: The Ciao dataset is a comprehensive collection of product reviews and ratings from the Ciao shopping website.
- DBLP: The DBLP dataset is a bibliographic database containing computer science research publications, authors, and their relationships.
- Filmtrust: The Filmtrust dataset is a collection of movie ratings and trust relationships between users.
- Lastfm: The Last.fm dataset is a music dataset that contains user listening histories, artist information, and user preferences.
- ML100k: The ML-100K dataset is a widely used benchmark dataset in the field of recommender systems, containing movie ratings and user information for evaluating collaborative filtering algorithms.
- ML1m: The ML-1M dataset is a movie rating dataset that contains one million ratings from users on various movies.
- ML20m: The ML-20M dataset is a larger movie rating dataset consisting of 20 million ratings from users on a vast collection of movies.
- Oklahoma: Oklahoma is a dataset composed of social networks of the University of Oklahoma. A link represents a friendship relation in social media, and every user has a profile for vertex features, including student/faculty status, gender, and major,
- UNC: UNC is a dataset of social networks in the University of North Carolina at Chapel Hill.
We provide their statistics as follows.
#Nodes | #Edges | #Features | |
---|---|---|---|
1,045 | 53,498 | 573 | |
Pokec_z | 67,796 | 882,765 | 276 |
Pokec_n | 66,569 | 729,129 | 265 |
NBA | 403 | 16,570 | 95 |
German | 1,000 | 24,970 | 27 |
Credit | 30,000 | 200,526 | 13 |
Recidivism | 18,876 | 403,977 | 18 |
Google+ | 290,468 | 3,601 | 2,532 |
AMiner-L | 129,726 | 591,039 | 5,694 |
AMiner-S | 39,424 | 52,460 | 5,694 |
Cora | 2,708 | 4,751 | 1,433 |
Citeseer | 3,312 | 4,194 | 3,703 |
Pubmed | 19,717 | 88,648 | 500 |
Amazon | 2,549 (item) 2 (genre) | 2,549 | N/A |
Yelp | 2,834 (item) 2 (genre) | 2,834 | N/A |
Ciao | 7,375 (user) 106,797 (product) | 57,544 | N/A |
DBLP | 22,166 (user) 296,277 (product) | 355,813 | N/A |
Filmtrust | 1,508 (user) 2,071 (item) | 35,497 | N/A |
Lastfm | 1,892 (customer) 17,632 (producer) | 92,800 | N/A |
ML100k | 943 (user) 1,682 (item) | 100,000 | 4 |
ML1m | 6,040 (user) 3,952 (item) | 1,000,209 | 4 |
ML20m | 138,493 (user) 27,278 (item) | 20,000,263 | N/A |
Oklahoma | 3,111 | 73,230 | N/A |
UNC | 4,018 | 65,287 | N/A |
13 different methods in total are implemented in this library. We provide an overview of their characteristics as follows.
- FairGNN: FairGNN is a GNN model designed to address fairness issues in graph-based tasks by incorporating fairness regularization techniques, ensuring equitable treatment of different groups in the learned representations and predictions.
- EDITS: EDITS approximates the inputs’ discrimination via Wasserstein distance and directly minimizes it between sensitive and nonsensitive groups by pruning the graph topology and node features.
- CrossWalk: CrossWalk is a method that enhances fairness in graph algorithms by biasing random walks to cross group boundaries and extends the range of the weighting including multi-hop neighbors.
- UGE: UGE learns node embeddings from an underlying bias-free graph, which is not influenced by sensitive node attributes to deal with the unbiased graph embedding problem.
- FairVGNN: FairVGNN is an advanced model that intelligently masks feature channels linked to sensitive attributes and dynamically fine-tunes encoder weights to reduce the impact of sensitive information, resulting in a fairer and unbiased machine learning algorithm.
- FairEdit: NIFTY goes beyond simply removing biases from the input graph during training. It takes a proactive approach by introducing artificial nodes and edges to address biases comprehensively, resulting in a more balanced and unbiased learning process.
- NIFTY: NIFTY is a groundbreaking approach designed to improve counterfactual fairness and stability of node representations. It achieves this through the utilization of a novel triplet-based objective function and layer-wise weight normalization using the Lipschitz constant, ensuring enhanced performance and reliability.
- GEAR: GEAR tackles graph unfairness through two key mechanisms: counterfactual graph augmentation and an adversarial learning approach that focuses on learning embeddings that are insensitive to sensitive attributes. These techniques collectively contribute to the effective mitigation of unfairness in graph-based models.
- InFoRM: InFoRM utilizes the similarity matrix of nodes to assess the individual fairness of GNNs. By incorporating a variant of the (d1, d2)-Lipschitz property, it addresses the challenge of achieving individual fairness solely based on node features, considering the interconnectedness of nodes in a graph.
- REDRESS: REDRESS introduces a plug-and-play framework that leverages individual fairness measures from a ranking perspective. This approach optimizes the training of Graph Neural Networks (GNNs) to simultaneously maximize utility and promote ranking-based individual fairness. By learning to rank, the framework ensures consistent relative ranking orders of node pairs in both input and output spaces.
- GUIDE: GUIDE is an innovative GNN framework that leverages the similarity matrix of individuals to learn personalized attention mechanisms. This enables the achievement of individual fairness while minimizing disparities at the group level.
- RawlsGCN: RawlsGCN integrates the Rawlsian difference principle into GCN, mitigating degree-related unfairness and improving its overall performance.
Methods | Debiasing Technique | Fairness Notions | Paper & Code |
---|---|---|---|
FairGNN [2] | Adversarial Learning | Group Fairness | [Paper] [Code] |
EDITS [3] | Edge Rewiring | Group Fairness | [Paper] [Code] |
FairWalk [4] | Rebalancing | Group Fairness | [Paper] [Code] |
CrossWalk [5] | Rebalancing | Group Fairness | [Paper] [Code] |
UGE [6] | Edge Rewiring | Group Fairness | [Paper] [Code] |
FairVGNN [7] | Adversarial Learning | Group Fairness | [Paper] [Code] |
FairEdit [8] | Edge Rewiring | Group Fairness | [Paper] [Code] |
NIFTY [9] | Optimization with Regularization | Group/Counterfactual Fairness | [Paper] [Code] |
GEAR [10] | Edge Rewiring | Group/Counterfactual Fairness | [Paper] [Code] |
InFoRM [11] | Optimization with Regularization | Individual Fairness | [Paper] [Code] |
REDRESS [12] | Optimization with Regularization | Individual Fairness | [Paper] [Code] |
GUIDE [13] | Optimization with Regularization | Individual Fairness | [Paper] [Code] |
RawlsGCN [14] | Rebalancing | Degree-Related Fairness | [Paper] [Code] |
We summarize the performances of the implemented 13 graph mining algorithms/frameworks by fairness notions, including group fairness, individual fairness, counterfactual fairness, and degree-related fairness.
We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including
Credit | Credit | Credit | Credit | Credit | Recidivism | Recidivism | Recidivism | Recidivism | Recidivism | |
---|---|---|---|---|---|---|---|---|---|---|
AUCROC | F1 | Acc | AUCROC | F1 | Acc | |||||
GCN_Vanilla | 0.722±0.005 | 0.876±0.002 | 0.784±0.004 | 0.160±0.110 | 0.108±0.073 | 0.865±0.020 | 0.672±0.031 | 0.804±0.014 | 0.094±0.012 | 0.130±0.023 |
FairGNN | 0.729±0.004 | 0.879±0.002 | 0.790±0.004 | 0.022±0.007 | 0.020±0.005 | 0.929±0.002 | 0.820±0.004 | 0.880±0.002 | 0.073±0.006 | 0.046±0.006 |
NIFTY | 0.717±0.000 | 0.876±0.000 | 0.782±0.001 | 0.019±0.019 | 0.014±0.010 | 0.841±0.006 | 0.403±0.008 | 0.716±0.002 | 0.017±0.001 | 0.006±0.003 |
EDITS | 0.667±0.012 | 0.876±0.000 | 0.779±0.000 | 0.005±0.002 | 0.004±0.002 | 0.847±0.026 | 0.607±0.058 | 0.773±0.029 | 0.026±0.012 | 0.013±0.001 |
FairVGNN | 0.654±0.034 | 0.865±0.018 | 0.772±0.015 | 0.030±0.040 | 0.023±0.032 | 0.826±0.009 | 0.722±0.015 | 0.796±0.020 | 0.090±0.009 | 0.089±0.014 |
FairEdit | 0.666±0.067 | 0.809±0.032 | 0.716±0.032 | 0.107±0.046 | 0.098±0.045 | 0.884±0.007 | 0.798±0.009 | 0.852±0.007 | 0.070±0.002 | 0.044±0.002 |
We present the evaluation results of fairness (including classification accuracy on the learned embeddings) on Pokec_z and Pokec_n below.
Classification Acc of Gender on Pokec_z | Classification Acc of Gender on Pokec_n | |
---|---|---|
FairWalk | 0.4962 ± 0.0031 | 0.5016 ± 0.0036 |
CrossWalk | 0.4943 ± 0.0007 | 0.5047 ± 0.0036 |
UGE-C | 0.4908 ± 0.00008 | 0.5002 ± 0.0001 |
We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including
Credit | Credit | Credit | Credit | Credit | Credit | Credit | Recidivism | Recidivism | Recidivism | Recidivism | Recidivism | Recidivism | Recidivism | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AUCROC | F1 | Acc | AUCROC | F1 | Acc | |||||||||
GCN_Vanilla | 0.684 ± 0.019 | 0.794± 0.027 | 0.698± 0.028 | 0.108± 0.031 | 0.087± 0.035 | 0.042± 0.029 | 0.022± 0.014 | 0.885 ± 0.018 | 0.782 ± 0.023 | 0.838 ± 0.017 | 0.075 ± 0.014 | 0.023 ± 0.019 | 0.132 ± 0.059 | 0.075 ± 0.028 |
NIFTY-GCN | 0.685 ± 0.007 | 0.792± 0.007 | 0.697± 0.007 | 0.106± 0.021 | 0.097± 0.024 | 0.004± 0.004 | 0.017± 0.003 | 0.799 ± 0.051 | 0.669 ± 0.050 | 0.752 ± 0.065 | 0.036 ± 0.022 | 0.019 ± 0.015 | 0.031 ± 0.017 | 0.025 ± 0.018 |
GEAR-GCN | 0.740 ± 0.008 | 0.835± 0.008 | 0.755± 0.011 | 0.104± 0.013 | 0.086± 0.018 | 0.001± 0.001 | 0.010± 0.003 | 0.896 ± 0.016 | 0.800 ± 0.031 | 0.852 ± 0.026 | 0.058 ± 0.017 | 0.019 ± 0.023 | 0.003 ± 0.002 | 0.038 ± 0.012 |
We present the evaluation results of both utility (including AUCROC) and fairness (including IF, GDIF, and Ranking-based IF) on Credit and Recidivism below.
Credit | Credit | Credit | Credit | Recidivism | Recidivism | Recidivism | Recidivism | |
---|---|---|---|---|---|---|---|---|
AUCROC | IF (in |
GDIF | Ranking-based IF | AUC | IF (in |
GDIF | Ranking-Based IF | |
GCN | 0.666±0.023 | 190.408±135.412 | 1.628±0.596 | 0.611±0.003 | 0.962±0.001 | 1.32e6±6.19e5 | 1.12±0.001 | 0.965±0.001 |
InFoRM | 0.645±0.010 | 23.560±8.760 | 2.422±0.143 | 0.621±0.014 | 0.651±0.043 | 11.606±4.28 | 1.106±0.133 | 0.971±0.001 |
REDRESS | 0.570±0.005 | 8.827e11±1.95e11 | 3.048±0.301 | 0.913±0.005 | 0.687±0.067 | 4.0393e12±6.84e12 | 1.816±0.594 | 0.980±0.002 |
GUIDE | 0.632±0.008 | 0.337±0.063 | 1.003±0.002 | 0.665±0.010 | 0.797±0.024 | 9.629±0.831 | 1.006±0.003 | 0.972±0.001 |
We present the evaluation results of both utility (including accuracy) and fairness (including bias according to Rawlsian difference principle) on Amazon-Photo dataset below.
Accuracy | Bias (according to Rawlsian difference principle) | |
---|---|---|
GCN | 0.8262 ±0.0090 | 0.5033±0.1552 |
RawlsGCN | 0.8708 ± 0.0134 | 0.0782±0.0071 |
.
├── LICENSE
├── MANIFEST.in
├── README.md
├── docs
├── pygdebias
│ ├── init.py
│ ├── datasets
│ ├── debiasing
│ └── metrics
├── requirements.txt
├── setup.cfg
└── setup.py
You are welcome to become part of this project. See contribute guide for more information.
Yushun Dong, Song Wang, Zaiyi Zheng, Zhenyu Lei, Alex Jing Huang, Jing Ma, Chen Chen, Jundong Li
We extend our heartfelt appreciation to everyone who has contributed to and will contribute to this work.
Reach out to us by submitting an issue report or sending an email to yd6eb@virginia.edu.