PyGDebias: Attributed Network Datasets and Fairness-Aware Graph Mining Algorithms

PyGDebias: Attributed Network Datasets and Fairness-Aware Graph Mining Algorithms

Graph mining algorithms have been playing a critical role in a plethora of areas. However, most of the existing ones lack fairness consideration. Consequently, they may deliver biased predictions toward certain demographic subgroups or individuals. To better understand existing debiasing techniques and facilitate the deployment of fairness-aware graph mining algorithms, we developed this library PyGDebias featured for built-in datasets and implementations of popular fairness-aware graph mining algorithms for the study of algorithmic fairness on graphs.

Specifically, this open-source library PyGDebias aims to provide a systematic schema to load datasets and compare different debiasing techniques for graph learning algorithms. Specifically, 26 graph datasets (including 24 commonly used ones and two newly constructed ones, AMiner-L and AMiner-S) are collected, and 13 algorithms are implemented in this library.

1. Citation

Our survey paper "Fairness in Graph Mining: A Survey" has been accepted by TKDE and released on arxiv [PDF]. If you find PyGDebias helpful, we would appreciate citations to the following paper:

@article{dong2023fairness,
  title={Fairness in graph mining: A survey},
  author={Dong, Yushun and Ma, Jing and Wang, Song and Chen, Chen and Li, Jundong},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2023},
  publisher={IEEE}
}

or:

Dong, Y., Ma, J., Wang, S., Chen, C., & Li, J. (2023). Fairness in graph mining: A survey. IEEE Transactions on Knowledge and Data Engineering.

2. API Cheatsheet

We summarize the basic API of the implemented graph mining algorithms as below.

fit(): Execute the training process for the initiated graph mining algorithm.
predict(): Evaluate the trained graph mining algorithm on the test set.

3. Installations

Here, we provide guidelines for setting up the library. There are basically 2 ways to install it

3.1 Manually

# Set up the environment
conda create -n pygdebias python=3.8
conda activate pygdebias

# Installation
git clone https://github.com/yushundong/PyGDebias.git
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install PyGDebias/ -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html

3.2 `pip`

conda create -n pygdebias python=3.8
conda activate pygdebias
pip install torch==1.12.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
pip install pygdebias==1.0.1 -f https://data.pyg.org/whl/torch-1.12.0%2Bcu116.html -f https://download.pytorch.org/whl/torch_stable.html

4. Usage & Examples

from pygdebias.debiasing import GUIDE
from pygdebias.datasets import Nba
# Available choices: 'Credit', 'German', 'Facebook', 'Pokec_z', 'Pokec_n', 'Nba', 'Twitter', 'Google', 'LCC', 'LCC_small', 'Cora', 'Citeseer', 'Amazon', 'Yelp', 'Epinion', 'Ciao', 'Dblp', 'Filmtrust', 'Lastfm', 'Ml-100k', 'Ml-1m', 'Ml-20m', 'Oklahoma', 'UNC', 'Bail'.

nba = Nba()
adj, features, idx_train, idx_val, idx_val, idx_test, labels, sens = nba.adj(), nba.features(), nba.idx_train(), nba.idx_val(), nba.idx_test(), nba.labels(), nba.sens()

# Initiate the model (with default parameters).
model = GUIDE()

# Train the model.
model.fit(adj, features, idx_train, idx_val, idx_test, labels, sens)

# Evaluate the model.
model.predict()

5. Collected Datasets

26 graph datasets are collected in this library, including 24 commonly used ones and two newly constructed ones (Aminer-L and Aminer-S). We provide their descriptions as follows.

Facebook: Facebook has anonymized features for each node representative of various attributes of a person’s Facebook profile.
Pokec_z & Pokec_n: The two datasets are sampled from Pokec by province. Pokec contains anonymized data of the whole social network in 2012, in which the profiles contain gender, age, hobbies, interest, education, working field and etc.
NBA: This dataset is an extension of a Kaggle dataset containing around 400 NBA basketball players. Features include the performance statistics of players in the 2016-2017 season and other various information e.g., nationality, age, and salary.
German: The dataset is a credit graph which has 1,000 nodes representing clients in a German bank that are connected based on the similarity of their credit accounts.
Credit: Credit contains individuals which are connected based on the similarity of their spending and payment patterns.
Recidivism: Recidivism has 18,876 nodes representing defendants who got released on bail at the U.S state courts during 1990-2009, where defendants are connected based on the similarity of past criminal records and demographics.
Google+: The dataset is created by data collected from a social networking platform developed by Google.
AMiner-L & AMiner-S: AMiner-L and AMiner-S are co-author networks with sensitive attributes ready for algorithmic fairness study on graphs. Specifically, we construct AMiner-L (titled 'LCC' in our built-in data loader) and AMiner-S (titled 'LCC_small' in our built-in data loader) based on the AMiner network [1]. To construct the two datasets, we first filter out the nodes in the AMiner network with incomplete information. Then we adopt two different approaches to sample a connected network from the filtered dataset: AMiner-L is a subgraph sampled with random walk, while AMiner-S is the largest connected component of the filtered AMiner network. In both datasets, nodes represent the researchers in different fields, and edges denote the co-authorship between researchers. The sensitive attribute is the continent of the affiliation each researcher belongs to, and the labels of nodes represent the primary research field of each researcher.
Cora: The Cora dataset is a collection of computer science research papers categorized into different topics.
Citeseer: The CiteSeer dataset is a digital library of scientific articles, primarily focused on computer science, featuring citation relationships and widely utilized for research in information retrieval and citation network analysis.
Pubmed: Pubmed contains citation networks that consider articles as nodes and descriptions of articles as their nodal attributes.
Amazon: The dataset constitutes a respective knowledge graph with entities and relations crawled from Amazon. The collection consists of four different domains: CDs and Vinyl, Clothing, Cell Phones, and Beauty.
Yelp: The Yelp dataset is a large collection of user-generated reviews and associated ratings for businesses, encompassing various industries and geographical locations.
Ciao: The Ciao dataset is a comprehensive collection of product reviews and ratings from the Ciao shopping website.
DBLP: The DBLP dataset is a bibliographic database containing computer science research publications, authors, and their relationships.
Filmtrust: The Filmtrust dataset is a collection of movie ratings and trust relationships between users.
Lastfm: The Last.fm dataset is a music dataset that contains user listening histories, artist information, and user preferences.
ML100k: The ML-100K dataset is a widely used benchmark dataset in the field of recommender systems, containing movie ratings and user information for evaluating collaborative filtering algorithms.
ML1m: The ML-1M dataset is a movie rating dataset that contains one million ratings from users on various movies.
ML20m: The ML-20M dataset is a larger movie rating dataset consisting of 20 million ratings from users on a vast collection of movies.
Oklahoma: Oklahoma is a dataset composed of social networks of the University of Oklahoma. A link represents a friendship relation in social media, and every user has a profile for vertex features, including student/faculty status, gender, and major,
UNC: UNC is a dataset of social networks in the University of North Carolina at Chapel Hill.

We provide their statistics as follows.

	#Nodes	#Edges	#Features
Facebook	1,045	53,498	573
Pokec_z	67,796	882,765	276
Pokec_n	66,569	729,129	265
NBA	403	16,570	95
German	1,000	24,970	27
Credit	30,000	200,526	13
Recidivism	18,876	403,977	18
Google+	290,468	3,601	2,532
AMiner-L	129,726	591,039	5,694
AMiner-S	39,424	52,460	5,694
Cora	2,708	4,751	1,433
Citeseer	3,312	4,194	3,703
Pubmed	19,717	88,648	500
Amazon	2,549 (item) 2 (genre)	2,549	N/A
Yelp	2,834 (item) 2 (genre)	2,834	N/A
Ciao	7,375 (user) 106,797 (product)	57,544	N/A
DBLP	22,166 (user) 296,277 (product)	355,813	N/A
Filmtrust	1,508 (user) 2,071 (item)	35,497	N/A
Lastfm	1,892 (customer) 17,632 (producer)	92,800	N/A
ML100k	943 (user) 1,682 (item)	100,000	4
ML1m	6,040 (user) 3,952 (item)	1,000,209	4
ML20m	138,493 (user) 27,278 (item)	20,000,263	N/A
Oklahoma	3,111	73,230	N/A
UNC	4,018	65,287	N/A

6. Collected Algorithms

13 different methods in total are implemented in this library. We provide an overview of their characteristics as follows.

FairGNN: FairGNN is a GNN model designed to address fairness issues in graph-based tasks by incorporating fairness regularization techniques, ensuring equitable treatment of different groups in the learned representations and predictions.
EDITS: EDITS approximates the inputs’ discrimination via Wasserstein distance and directly minimizes it between sensitive and nonsensitive groups by pruning the graph topology and node features.
CrossWalk: CrossWalk is a method that enhances fairness in graph algorithms by biasing random walks to cross group boundaries and extends the range of the weighting including multi-hop neighbors.
UGE: UGE learns node embeddings from an underlying bias-free graph, which is not influenced by sensitive node attributes to deal with the unbiased graph embedding problem.
FairVGNN: FairVGNN is an advanced model that intelligently masks feature channels linked to sensitive attributes and dynamically fine-tunes encoder weights to reduce the impact of sensitive information, resulting in a fairer and unbiased machine learning algorithm.
FairEdit: NIFTY goes beyond simply removing biases from the input graph during training. It takes a proactive approach by introducing artificial nodes and edges to address biases comprehensively, resulting in a more balanced and unbiased learning process.
NIFTY: NIFTY is a groundbreaking approach designed to improve counterfactual fairness and stability of node representations. It achieves this through the utilization of a novel triplet-based objective function and layer-wise weight normalization using the Lipschitz constant, ensuring enhanced performance and reliability.
GEAR: GEAR tackles graph unfairness through two key mechanisms: counterfactual graph augmentation and an adversarial learning approach that focuses on learning embeddings that are insensitive to sensitive attributes. These techniques collectively contribute to the effective mitigation of unfairness in graph-based models.
InFoRM: InFoRM utilizes the similarity matrix of nodes to assess the individual fairness of GNNs. By incorporating a variant of the (d1, d2)-Lipschitz property, it addresses the challenge of achieving individual fairness solely based on node features, considering the interconnectedness of nodes in a graph.
REDRESS: REDRESS introduces a plug-and-play framework that leverages individual fairness measures from a ranking perspective. This approach optimizes the training of Graph Neural Networks (GNNs) to simultaneously maximize utility and promote ranking-based individual fairness. By learning to rank, the framework ensures consistent relative ranking orders of node pairs in both input and output spaces.
GUIDE: GUIDE is an innovative GNN framework that leverages the similarity matrix of individuals to learn personalized attention mechanisms. This enables the achievement of individual fairness while minimizing disparities at the group level.
RawlsGCN: RawlsGCN integrates the Rawlsian difference principle into GCN, mitigating degree-related unfairness and improving its overall performance.

Methods	Debiasing Technique	Fairness Notions	Paper & Code
FairGNN [2]	Adversarial Learning	Group Fairness	[Paper] [Code]
EDITS [3]	Edge Rewiring	Group Fairness	[Paper] [Code]
FairWalk [4]	Rebalancing	Group Fairness	[Paper] [Code]
CrossWalk [5]	Rebalancing	Group Fairness	[Paper] [Code]
UGE [6]	Edge Rewiring	Group Fairness	[Paper] [Code]
FairVGNN [7]	Adversarial Learning	Group Fairness	[Paper] [Code]
FairEdit [8]	Edge Rewiring	Group Fairness	[Paper] [Code]
NIFTY [9]	Optimization with Regularization	Group/Counterfactual Fairness	[Paper] [Code]
GEAR [10]	Edge Rewiring	Group/Counterfactual Fairness	[Paper] [Code]
InFoRM [11]	Optimization with Regularization	Individual Fairness	[Paper] [Code]
REDRESS [12]	Optimization with Regularization	Individual Fairness	[Paper] [Code]
GUIDE [13]	Optimization with Regularization	Individual Fairness	[Paper] [Code]
RawlsGCN [14]	Rebalancing	Degree-Related Fairness	[Paper] [Code]

7. Performance Leaderboards

We summarize the performances of the implemented 13 graph mining algorithms/frameworks by fairness notions, including group fairness, individual fairness, counterfactual fairness, and degree-related fairness.

7.1 Group Fairness

7.1.1 GNN-based ones:

We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including $\Delta_{SP}$ and $\Delta_{EO}$) on Credit and Recidivism below.

	Credit	Credit	Credit	Credit	Credit	Recidivism	Recidivism	Recidivism	Recidivism	Recidivism
	AUCROC	F1	Acc	$\Delta_{SP}$	$\Delta_{EO}$	AUCROC	F1	Acc	$\Delta_{SP}$	$\Delta_{EO}$
GCN_Vanilla	0.722±0.005	0.876±0.002	0.784±0.004	0.160±0.110	0.108±0.073	0.865±0.020	0.672±0.031	0.804±0.014	0.094±0.012	0.130±0.023
FairGNN	0.729±0.004	0.879±0.002	0.790±0.004	0.022±0.007	0.020±0.005	0.929±0.002	0.820±0.004	0.880±0.002	0.073±0.006	0.046±0.006
NIFTY	0.717±0.000	0.876±0.000	0.782±0.001	0.019±0.019	0.014±0.010	0.841±0.006	0.403±0.008	0.716±0.002	0.017±0.001	0.006±0.003
EDITS	0.667±0.012	0.876±0.000	0.779±0.000	0.005±0.002	0.004±0.002	0.847±0.026	0.607±0.058	0.773±0.029	0.026±0.012	0.013±0.001
FairVGNN	0.654±0.034	0.865±0.018	0.772±0.015	0.030±0.040	0.023±0.032	0.826±0.009	0.722±0.015	0.796±0.020	0.090±0.009	0.089±0.014
FairEdit	0.666±0.067	0.809±0.032	0.716±0.032	0.107±0.046	0.098±0.045	0.884±0.007	0.798±0.009	0.852±0.007	0.070±0.002	0.044±0.002

7.1.2 Non-GNN-based ones:

We present the evaluation results of fairness (including classification accuracy on the learned embeddings) on Pokec_z and Pokec_n below.

	Classification Acc of Gender on Pokec_z	Classification Acc of Gender on Pokec_n
FairWalk	0.4962 ± 0.0031	0.5016 ± 0.0036
CrossWalk	0.4943 ± 0.0007	0.5047 ± 0.0036
UGE-C	0.4908 ± 0.00008	0.5002 ± 0.0001

7.2 Counterfactual Fairness

We present the evaluation results of both utility (including AUCROC, F1 score, and accuracy) and fairness (including $\Delta_{SP}$, $\Delta_{EO}$, $\delta_{CF}$, and $R^2$) on Credit and Recidivism below.

	Credit	Credit	Credit	Credit	Credit	Credit	Credit	Recidivism	Recidivism	Recidivism	Recidivism	Recidivism	Recidivism	Recidivism
	AUCROC	F1	Acc	$\Delta_{SP}$	$\Delta_{EO}$	$\delta_{CF}$	$R^2$	AUCROC	F1	Acc	$\Delta_{SP}$	$\Delta_{EO}$	$\delta_{CF}$	$R^2$
GCN_Vanilla	0.684 ± 0.019	0.794± 0.027	0.698± 0.028	0.108± 0.031	0.087± 0.035	0.042± 0.029	0.022± 0.014	0.885 ± 0.018	0.782 ± 0.023	0.838 ± 0.017	0.075 ± 0.014	0.023 ± 0.019	0.132 ± 0.059	0.075 ± 0.028
NIFTY-GCN	0.685 ± 0.007	0.792± 0.007	0.697± 0.007	0.106± 0.021	0.097± 0.024	0.004± 0.004	0.017± 0.003	0.799 ± 0.051	0.669 ± 0.050	0.752 ± 0.065	0.036 ± 0.022	0.019 ± 0.015	0.031 ± 0.017	0.025 ± 0.018
GEAR-GCN	0.740 ± 0.008	0.835± 0.008	0.755± 0.011	0.104± 0.013	0.086± 0.018	0.001± 0.001	0.010± 0.003	0.896 ± 0.016	0.800 ± 0.031	0.852 ± 0.026	0.058 ± 0.017	0.019 ± 0.023	0.003 ± 0.002	0.038 ± 0.012

7.3 Individual Fairness

We present the evaluation results of both utility (including AUCROC) and fairness (including IF, GDIF, and Ranking-based IF) on Credit and Recidivism below.

	Credit	Credit	Credit	Credit	Recidivism	Recidivism	Recidivism	Recidivism
	AUCROC	IF (in $10^3$)	GDIF	Ranking-based IF	AUC	IF (in $10^3$)	GDIF	Ranking-Based IF
GCN	0.666±0.023	190.408±135.412	1.628±0.596	0.611±0.003	0.962±0.001	1.32e6±6.19e5	1.12±0.001	0.965±0.001
InFoRM	0.645±0.010	23.560±8.760	2.422±0.143	0.621±0.014	0.651±0.043	11.606±4.28	1.106±0.133	0.971±0.001
REDRESS	0.570±0.005	8.827e11±1.95e11	3.048±0.301	0.913±0.005	0.687±0.067	4.0393e12±6.84e12	1.816±0.594	0.980±0.002
GUIDE	0.632±0.008	0.337±0.063	1.003±0.002	0.665±0.010	0.797±0.024	9.629±0.831	1.006±0.003	0.972±0.001

7.4 Degree-Related Fairness

We present the evaluation results of both utility (including accuracy) and fairness (including bias according to Rawlsian difference principle) on Amazon-Photo dataset below.

	Accuracy	Bias (according to Rawlsian difference principle)
GCN	0.8262 ±0.0090	0.5033±0.1552
RawlsGCN	0.8708 ± 0.0134	0.0782±0.0071

Folder Structure

.
├── LICENSE
├── MANIFEST.in
├── README.md
├── docs
├── pygdebias
│ ├── init.py
│ ├── datasets
│ ├── debiasing
│ └── metrics
├── requirements.txt
├── setup.cfg
└── setup.py