The code of our paper "SARW: Similarity-Aware Random Walk for GCN" is rewritten according to the paper "Graphsaint: Graph sampling based inductive learning method".
The following are some of the similar code environments as paper Graphsaint:
- python >= 3.6.8
- tensorflow >=1.12.0 / pytorch >= 1.1.0
- cython >=0.29.2
- numpy >= 1.14.3
- scipy >= 1.1.0
- scikit-learn >= 0.19.1
- pyyaml >= 3.12
- g++ >= 5.4.0
- openmp >= 4.0
All datasets used in our papers are available for download:
- PPI
- PPI-large (a larger version of PPI)
- Flickr
- Yelp
- Amazon
They are available on Google Drive link (alternatively, BaiduYun link (code: f1ao)). Rename the folder to data
at the root directory. The directory structure should be as below:
GraphSAINT/
│ README.md
│ run_graphsaint.sh
│ ...
│
└───graphsaint/
│ │ globals.py
│ │ cython_sampler.pyx
│ │ ...
│ │
│ └───tensorflow_version/
│ │ │ train.py
│ │ │ model.py
│ │ │ ...
│ │
│ └───pytorch_version/
│ │ train.py
│ │ model.py
│ │ ...
│
└───data/
│ └───ppi/
│ │ │ adj_train.npz
│ │ │ adj_full.npz
│ │ │ ...
│ │
│ └───reddit/
│ │ │ ...
│ │
│ └───...
│
The hyperparameters needed in training can be set via the configuration file: ./train_config/<name>.yml
.
The configuration files to reproduce the Table 2 results are packed in ./train_config/table2/
.
For detailed description of the configuration file format, please see ./train_config/README.md
First of all, please compile cython samplers (see above).
We suggest looking through the available command line arguments defined in ./graphsaint/globals.py
(shared by both the Tensorflow and PyTorch versions). By properly setting the flags, you can maximize CPU utilization in the sampling step (by telling the number of available cores), select the directory to place log files, and turn on / off loggers (Tensorboard, Timeline, ...), etc.
To run the code on CPU
python -m graphsaint.<tensorflow/pytorch>_version.train --data_prefix ./data/<dataset_name> --train_config <path to train_config yml> --gpu -1
To run the code on GPU
python -m graphsaint.<tensorflow/pytorch>_version.train --data_prefix ./data/<dataset_name> --train_config <path to train_config yml> --gpu <GPU number>
For example --gpu 0
will run on the first GPU. Also, use --gpu <GPU number> --cpu_eval
to make GPU perform the minibatch training and CPU to perform the validation / test evaluation.