This is a Pytorch code Implementation of the paper Exploring Correlations of Self-Supervised Tasks for Graphs, which is accepted by the ICML 2024. We quantitatively characterize the correlations between different graph self-supervision tasks and obtain more effective graph self-supervised representations with our proposed GraphTCM.
We used the following packages under Python 3.10
.
pytorch 2.1.1
torch-geometric 2.4.0
matplotlib 3.5.0
pandas 2.1.3
Existing graph self-supervised methods can be categorized into four primary: feature-based (FB), structure-based (SB), auxiliary property-based (APB) and contrast-based (CB). To comprehensively understand the complex relationships in graph self-supervised tasks, we have chosen two representative methods from each category for detailed analysis.
- GraphComp (https://github.com/Shen-Lab/SS-GCNs). Its objective is to reconstruct the masked features, teaching the network to extract features from the context.
- AttributeMask (https://github.com/ChandlerBang/SelfTask-GNN). It aims to reconstruct the dense feature matrix generated by Principal Component Analysis (PCA) rather than the raw features.
- GAE (https://github.com/DaehanKim/vgae_pytorch). It aims to reconstruct the adjacency matrix using the node representations.
- EdgeMask (https://github.com/ChandlerBang/SelfTask-GNN). It aims to acquire finer-grained local structural information by employing link prediction as a pretext task.
- NodeProp (https://github.com/ChandlerBang/SelfTask-GNN). It utilizes a node-level pretext task, predicting properties for individual nodes, including attributes such as degree, local node importance, and local clustering coefficient.
- DisCluster (https://github.com/ChandlerBang/SelfTask-GNN). It performs regression on the distances between each node and predefined graph clusters.
- DGI (https://github.com/PetarV-/DGI). It maximizes mutual information between representations from subgraphs with differing scales, facilitating the graph encoder in attaining a comprehensive grasp of both localized and global semantic information.
- SubgCon (https://github.com/yzjiao/Subg-Con). It captures regional structural insights by capitalizing on the robust correlation between central nodes and their sampled subgraphs.
We provide the representations obtained from training using these eight self-supervised methods across various datasets, located in the directory emb/
.
Given two self-supervised tasks train_GraphTCM.py
.
Please run train_GraphTCM.py
to train a GraphTCM model on the specific dataset.
usage: train_GraphTCM.py [-h] [--hidden_dim HIDDEN_DIM] [--pooling POOLING] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--valid_rate VALID_RATE] [--dataset DATASET]
PyTorch implementation for building the correlation.
options:
-h, --help show this help message and exit
--hidden_dim HIDDEN_DIM hidden dimension
--pooling POOLING pooling type
--device_num DEVICE_NUM device number
--epoch_num EPOCH_NUM epoch number
--lr LR learning rate
--seed SEED random seed
--valid_rate VALID_RATE validation rate
--dataset DATASET dataset
After training a GraphTCM model, please run train_emb.py
to obtain more effective self-supervised representations. To facilitate further experiments, we also provide the trained representations based on GraphTCM in the emb/
directory, all named GraphTCM.pkl
.
usage: train_emb.py [-h] [--hidden_dim HIDDEN_DIM] [--device_num DEVICE_NUM] [--epoch_num EPOCH_NUM] [--lr LR] [--seed SEED] [--dataset DATASET] [--path PATH] [--target TARGET] [--train_method TRAIN_METHOD]
PyTorch implementation for training the representations.
options:
-h, --help show this help message and exit
--hidden_dim HIDDEN_DIM hidden dimension
--device_num DEVICE_NUM device number
--epoch_num EPOCH_NUM epoch number
--lr LR learning rate
--seed SEED random seed
--dataset DATASET dataset
--path PATH path for the trained GraphTCM model
--target TARGET training target (ones or zeros)
--train_method TRAIN_METHOD training method
We have provided scripts with hyper-parameter settings to reproduce the experimental results presented in our paper. Please run run.sh
under downstream/
to obtain the downstream results across various datasets.
cd downstream/
sh run.sh
You can cite our paper by following bibtex.
@inproceedings{Fang2024ExploringCO,
title={Exploring Correlations of Self-supervised Tasks for Graphs},
author={Taoran Fang and Wei Zhou and Yifei Sun and Kaiqiao Han and Lvbin Ma and Yang Yang},
booktitle={International Conference on Machine Learning},
year={2024}
}