SCARA-PPR

This is the original code for "SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization" (VLDB 2022) and "Scalable Decoupling Graph Neural Networks with Feature-Oriented Optimization" (VLDBJ 2023).

Paper - VLDB | Paper - VLDBJ | GitHub | Tech Report | arXiv

Citation

If you find this work useful, please cite our papers:

VLDBJ:

Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin.
Scalable Decoupling Graph Neural Networks with Feature-Oriented Optimization.
The VLDB Journal, 33, 2023. doi:10.1007/s00778-023-00829-6.

@article{liao2023scalable,
  title={Scalable Decoupling Graph Neural Networks with Feature-Oriented Optimization},
  author={Liao, Ningyi and Mo, Dingheng and Luo, Siqiang and Li, Xiang and Yin, Pengcheng},
  journal={The {VLDB} Journal},
  volume={33},
  year={2023},
  publisher={Springer},
  url={https://link.springer.com/article/10.1007/s00778-023-00829-6},
  doi={10.1007/s10994-021-06049-9}
}

VLDB:

Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin.
SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization.
PVLDB, 15(11): 3240-3248, 2022. doi:10.14778/3551793.3551866.

@article{liao2022scara,
  title={{SCARA}: Scalable Graph Neural Networks with Feature-Oriented Optimization},
  author={Liao, Ningyi and Mo, Dingheng and Luo, Siqiang and Li, Xiang and Yin, Pengcheng},
  journal={Proceedings of the VLDB Endowment},
  volume={15},
  number={11},
  pages={3240-3248},
  year={2022},
  publisher={VLDB Endowment},
  url = {https://doi.org/10.14778/3551793.3551866},
}

Usage

We provide a complete example and its log in the demo notebook. The sample PubMed dataset is available in the data folder.

Data Preparation

Download data (links below) in GBP format to path data/[dataset_name]. Similar to the PubMed dataset example, there are three files:

adj.txt: adjacency table
- First line: "# [number of nodes]"
feats.npy: features in .npy array
labels.npz: node label information
- 'label': labels (number or one-hot)
- 'idx_train/idx_val/idx_test': indices of training/validation/test nodes (inductive task)

Run command python data_processor.py to generate additional processed files:

degrees.npz: node degrees in .npz 'arr_0'
feats_norm.npy: normalized features in .npy array
- Large matrix can be split
query.txt: indices of queried nodes

Precompute

Environment: CMake 3.16, C++ 14. Dependencies: eigen3
CMake cmake -B build, then make
Run script: ./run_pubmed.sh

Train and Test

Install dependencies: conda create --name [envname] --file requirements.txt
Run experiment: python run.py -f [seed] -c [config_file] -v [device]

Baseline Models

GraphSAINT: GraphSAINT
APPNP: APPNP
PPRGo: PPRGo
GBP: GBP
AGP: AGP
GAS: GAS

Dataset Links

Citeseer & Pubmed: GBP
PPI: GraphSAGE
Yelp: GraphSAINT
Reddit: PPRGo
Products & Papers100M: OGB
Amazon: Cluster-GCN
MAG: PANE

gdmnl/SCARA-PPR