/GNNSampler

Primary LanguagePythonMIT LicenseMIT

GNNSampler: Bridging the Gap between Sampling Algorithms of GNN and Hardware

An implementation of the locality-aware optimization in GNNSampler. It can be plugged into the pre-processing step to accelerate the sampling process for sampling-based models. Since the paper is under review, more scripts designed to flexibly adjust the sampling weight will be open-sourced after acceptance.

Overview

GNNSampler is a unified programming model for mainstream sampling algorithms, which covers key procedures in the general sampling process. One can embed GNNSamlper into the general sampling process to learn large-scale graphs. The following figure describes the workflow of learning large-scale graph data with GNN, where GNNSampler is embedded for optimizing sampling. Moreover, to leverage the hardware feature, we choose the data locality as a case study and implement locality-aware optimizations in GNNSampler. The right part of the figure illustrates a case for data locality exploration. More details can be found in our paper.

workflow

Experimental Devices

Platform Configuration
CPU Intel Xeon E5-2683 v3 CPUs (dual 14-core)
GPU NVIDIA Tesla V100 GPU (16 GB memory)

Dependencies

  • python
  • tensorflow
  • numpy
  • scipy
  • scikit-learn
  • pyyaml

Usage

One can use the following shell scripts to perform accelerated model training with locality-aware optimization:
./locality_amazon.sh
./locality_reddit.sh
./locality_flickr.sh
For comparison, one can use the following shell scripts to perform the vanilla methods (with no locality-aware optimization):
./vanilla_amazon.sh
./vanilla_reddit.sh
./vanilla_flickr.sh

Code Directory

GNNSampler/
│   README.md
│   locality_amazon.sh (One can perform optimized model training on Amazon dataset with GraphSAINT as the backbone)
|   vanilla_amazon.sh (One can perform vanilla model training on Amazon dataset with GraphSAINT as the backbone)
│   ...
└───graphsaint/ 
|   (We use the tensorflow-based implementation of GraphSAINT)
└───precomputed_weight/
|   (We offer pre-computed weights for some datasets to reproduce the performance reported in the paper)
└───train_config/
|   (The configurations of training are generally taken from backbone's repository)
└───data/
    (Pls. download and add datasets into this folder)

Datasets

All datasets used in our papers are available:

Acknowledgements

The locality-aware optimization is embedded in various sampling-based models to verify its efficiency and effectiveness. We use the implementations of GraphSAGE, FastGCN, and GraphSAINT as backbones, and owe many thanks to the authors for making their code available.