This is the original code for "SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization" (VLDB 2022) and "Scalable Decoupling Graph Neural Networks with Feature-Oriented Optimization" (VLDBJ 2023).
Paper - VLDB | Paper - VLDBJ | GitHub | Tech Report | arXiv
If you find this work useful, please cite our papers:
Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin.
Scalable Decoupling Graph Neural Networks with Feature-Oriented Optimization.
The VLDB Journal, 33, 2023. doi:10.1007/s00778-023-00829-6.
@article{liao2023scalable,
title={Scalable Decoupling Graph Neural Networks with Feature-Oriented Optimization},
author={Liao, Ningyi and Mo, Dingheng and Luo, Siqiang and Li, Xiang and Yin, Pengcheng},
journal={The {VLDB} Journal},
volume={33},
year={2023},
publisher={Springer},
url={https://link.springer.com/article/10.1007/s00778-023-00829-6},
doi={10.1007/s10994-021-06049-9}
}
Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, and Pengcheng Yin.
SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization.
PVLDB, 15(11): 3240-3248, 2022. doi:10.14778/3551793.3551866.
@article{liao2022scara,
title={{SCARA}: Scalable Graph Neural Networks with Feature-Oriented Optimization},
author={Liao, Ningyi and Mo, Dingheng and Luo, Siqiang and Li, Xiang and Yin, Pengcheng},
journal={Proceedings of the VLDB Endowment},
volume={15},
number={11},
pages={3240-3248},
year={2022},
publisher={VLDB Endowment},
url = {https://doi.org/10.14778/3551793.3551866},
}
We provide a complete example and its log in the demo notebook. The sample PubMed dataset is available in the data folder.
- Download data (links below) in GBP format to path
data/[dataset_name]
. Similar to the PubMed dataset example, there are three files:
adj.txt
: adjacency table- First line: "
# [number of nodes]
"
- First line: "
feats.npy
: features in .npy arraylabels.npz
: node label information- 'label': labels (number or one-hot)
- 'idx_train/idx_val/idx_test': indices of training/validation/test nodes (inductive task)
- Run command
python data_processor.py
to generate additional processed files:
degrees.npz
: node degrees in .npz 'arr_0'feats_norm.npy
: normalized features in .npy array- Large matrix can be split
query.txt
: indices of queried nodes
- Environment: CMake 3.16, C++ 14. Dependencies: eigen3
- CMake
cmake -B build
, thenmake
- Run script:
./run_pubmed.sh
- Install dependencies:
conda create --name [envname] --file requirements.txt
- Run experiment:
python run.py -f [seed] -c [config_file] -v [device]
- Citeseer & Pubmed: GBP
- PPI: GraphSAGE
- Yelp: GraphSAINT
- Reddit: PPRGo
- Products & Papers100M: OGB
- Amazon: Cluster-GCN
- MAG: PANE