/S2GAE

Primary LanguagePython


S2GAE: Self-Supervised Graph Autoencoder

This is the PyG implementation for WSDM'23 paper: S2GAE: Self-Supervised Graph Autoencoders Are Generalizable Learners with Graph Masking

S2GAE is a generalized self-supervised graph representation learning method, which achieves competitive or better performance than existing state-of-the-art methods on different types of tasks including node classification, link prediction, graph classification, and molecular property prediction.

Installation

The required packages can be installed by running pip install -r requirements.txt.

Datasets

The datasets used in our paper can be automatically downlowad.

Quick Start

For quick start, you could try:

Node classification (Cora, CiteSeer, and PubMed)

python s2gae_nc_acc.py --dataset `Cora`

Link prediction (ogbl-ddi, ogbl-collab, and ogbl-ppa)

python s2gae_large_lp.py --dataset "ogbl-ddi" 

Graph classification

Experimental Results

Node classification (Accuracy, %)

Cora CiteSeer PubMed A-Compute A-Photo Coauthor-CS Coauthor-Physics Ogbn-arxiv Ogbn-proteins
MVGRL 85.86±0.15 73.18±0.22 84.86±0.31 88.70±0.24 92.15±0.20 92.87±0.13 95.35±0.08 68.33±0.31 -
BGRL 86.16±0.20 73.96±0.14 86.42±0.18 90.48±0.10 93.22±0.15 93.35±0.06 96.16±0.09 71.77±0.19 _
GraphMAE 85.45±0.40 72.48±0.77 85.74±0.14 88.04±0.61 92.73±0.17 93.47±0.04 96.13±0.03 71.86±0.00 60.99±0.21
MaskGAE 87.31±0.05 75.20±0.07 86.56±0.26 90.52±0.04 93.33±0.14 92.31±0.05 95.79±0.02 70.99±0.12 61.23±0.19
S2GAE(ours) 86.15±0.25 74.60±0.06 86.91±0.28 90.94±0.08 93.61±0.10 91.70±0.08 95.82±0.03 72.02±0.05 63.33±0.12

Link prediction (AUC)

Cora CiteSeer PubMed Blogcatalog Flickr Ogbl-ddi Ogbl-collab Ogbl-ppa
AUC AUC AUC AUC AUC AUC Hits@20 Hits@50 Hits@10
GAE 91.09±0.01 90.52±0.04 96.40±0.01 84.91±1.44 92.50±0.40 37.07±5.07 44.75±1.07 2.52±0.47
GraphMAE 89.19±0.00 91.20±0.11 93.72±0.00 76.60±1.32 88.69±0.04 - 22.79±1.62 0.18±0.28
MaskGAE 96.66±0.17 98.00±0.23 99.06±0.05 81.06±3.06 93.60±0.14 16.25±1.60 32.47±0.59 0.23±0.04
S2GAE(ours) 95.05±0.76 94.85±0.49 97.38±0.17 87.06±0.37 94.38±0.02 65.91±3.50 54.74±1.06 3.98±1.33

Graph classification (Accuracy, %)

IMDB-B IMDB-M PROTEINS COLLAB MUTAG REDDIT-B NCI1
InfoGraph 73.03±0.87 49.69±0.53 74.44±0.31 70.65±1.13 91.20±1.30 - 76.20±1.06
GraphCL 71.14±0.44 48.58±0.67 74.39±0.45 71.36±1.15 86.80±1.34 89.53±0.84 77.87±0.41
MVGRL 74.20±0.70 51.20±0.50 - - 89.70±1.10 84.50±0.60 -
GraphMAE 75.52±0.66 51.63±0.52 75.30±0.39 80.32±0.46 88.19±1.26 88.01±0.19 80.40±0.30
S2GAE(ours) 75.76±0.62 51.79±0.36 76.37±0.43 81.02±0.53 88.26±0.76 87.83±0.27 80.80±0.24

Citing

If you find this work is helpful to your research, please consider citing our paper:

@inproceedings{tan2023s2gae,
  title={S2GAE: Self-Supervised Graph Autoencoders Are Generalizable Learners with Graph Masking},
  author={Tan, Qiaoyu and Liu, Ninghao and Huang, Xiao and Choi, Soo-Hyun and Li, Li and Chen, Rui and Hu, Xia},
  booktitle={Proceedings of the 16th ACM International Conference on Web Search and Data Mining},
  year={2023}
}
@article{tan2022mgae,
  title={Mgae: Masked autoencoders for self-supervised learning on graphs},
  author={Tan, Qiaoyu and Liu, Ninghao and Huang, Xiao and Chen, Rui and Choi, Soo-Hyun and Hu, Xia},
  journal={arXiv preprint arXiv:2201.02534},
  year={2022}
}