This is the official repo of the WWW'24 paper "Fast Inference of Removal-Based Node Influence". We provide a new perspective of evaluating node influence: the task-specific node influence on GNN model’s prediction based on node removal. We use graph neural network (GNN) models as a surrogate to learn the underlying message propagation patterns on a graph. After training a GNN model, we remove a node, apply the trained GNN model, and use the output change to measure the influence of the removed node. NORA (NOde-Removal-based fAst GNN inference) is an efficient calculation method that can approximate the node influence for all nodes. Our implementation is based on the dgl library [1], an implementation GitHub repository of DrGAT, and the official GitHub repository of TIMME [2].
[1] Wang, Minjie, et al. "Deep graph library: A graph-centric, highly-performant package for graph neural networks." arXiv preprint arXiv:1909.01315 (2019).
[2] Zhiping Xiao, Weiping Song, Haoyan Xu, Zhicheng Ren, and Yizhou Sun. 2020. TIMME: Twitter ideology-detection via multi-task multi-relational embedding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2258–2268.
- For the Planetoid datasets (Cora, CiteSeer, and PubMed), please refer to the readme of the "./planetoid" directory.
- For the ogbn-arxiv dataset, please refer to the readme of the "./arxiv" directory.
- For the Twitter datasets (P50, P_20_50), please refer to the readme of the "./TIMME" directory.
To generate the ground truth node influence by the brute-force method, please run:
python main.py --dataset Cora --model GCN
You can substitute the dataset name and model name with your desired one. If you want to use the link prediction model, please add "_edge" after the model number, such as "GCN_edge", "GraphSAGE_edge", etc. Here we provide a list of supported dataset names and GNN model names:
- "Cora", "CiteSeer", and "PubMed" datasets: "GCN", "GraphSAGE", "GAT", "GCNII", "GCN_edge", "GraphSAGE_edge", "GAT_edge", "GCNII_edge"
- "ogbn-arxiv" dataset: "GCN", "GraphSAGE", "DrGAT", "GCNII", "GCN_edge", "GraphSAGE_edge", "GAT_edge", "GCNII_edge"
- "P50" and "P_20_50" datasets: "TIMME", "TIMME_edge"
We provide the script with our hyper-parameters in "nora.sh". Please choose the one according to the dataset and GNN model you want to use.
We provide the script with our hyper-parameters in "mask.sh". Please choose the one according to the dataset and GNN model you want to use.
We provide the script with our hyper-parameters in "gcn_n.sh" and "gcn_e.sh". "gcn_n.sh" is the "Predict-N" method, and "gcn_e.sh" is the "Predict-E" method. Please choose the one according to the dataset and GNN model you want to use.
If you want to experiment with other hyper-parameters, please refer to the annotations in the "args" settings in main.py.
We use evaluate.py for evaluation. If you want to evaluate the approximation performance, we provide the evaluation script in "nora.sh", "mask.sh", "gcn_n.sh", and "gcn_e.sh". Please choose the one according to the dataset and GNN model you want to use. For other evaluation functions, please see the "args" settings in evaluate.py.