/bgnn

Boost then convolve paper's code

Primary LanguagePython

Boosted Graph Neural Networks

The code and data for the ICLR 2021 paper: Boost then Convolve: Gradient Boosting Meets Graph Neural Networks

This code contains implementation of the following models for graphs:

  • CatBoost
  • LightGBM
  • Fully-Connected Neural Network (FCNN)
  • GNN (GAT, GCN, AGNN, APPNP)
  • FCNN-GNN (GAT, GCN, AGNN, APPNP)
  • ResGNN (CatBoost + {GAT, GCN, AGNN, APPNP})
  • BGNN (end-to-end {CatBoost + {GAT, GCN, AGNN, APPNP}})

Installation

To run the models you have to download the repo, install the requirements, and extract the datasets.

First, let's create a python environment:

mkdir envs
cd envs
python -m venv bgnn_env
source bgnn_env/bin/activate
cd ..

Second, let's download the code and install requirements

git clone https://github.com/nd7141/bgnn.git 
cd bgnn
unzip datasets.zip
make install

Next we need to install a proper version of PyTorch and DGL, depending on the cuda version of your machine. We strongly encourage to use GPU-supported versions of DGL (the speed up in training can be 100x).

First, determine your cuda version with nvcc --version. Then, check installation instructions for pytorch. For example for cuda version 9.2, install it as follows:

pip install torch==1.7.1+cu92 torchvision==0.8.2+cu92 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

If you don't have GPU, use the following:

pip install torch==1.7.1+cpu torchvision==0.8.2+cpu torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Similarly, you need to install DGL library. For example, cuda==9.2:

pip install dgl-cu92

For cpu version of DGL:

pip install dgl

Tested versions of torch and dgl are:

  • torch==1.7.1+cu92
  • dgl_cu92==0.5.3

Running

Starting point is file scripts/run.py:

python scripts/run.py dataset models 
    (optional) 
            --save_folder: str = None
            --task: str = 'regression',
            --repeat_exp: int = 1,
            --max_seeds: int = 5,
            --dataset_dir: str = None,
            --config_dir: str = None

Available options for dataset:

  • house (regression)
  • county (regression)
  • vk (regression)
  • wiki (regression)
  • avazu (regression)
  • vk_class (classification)
  • house_class (classification)
  • dblp (classification)
  • slap (classification)
  • path/to/your/dataset

Available options for models are catboost, lightgbm, gnn, resgnn, bgnn, all.

Each model is specifed by its config. Check configs/ folder to specify parameters of the model and run.

Upon completion, the results wil be saved in the specifed folder (default: results/{dataset}/day_month/). This folder will contain aggregated_results.json, which will contain aggregated results for each model. Each model will have 4 numbers in this order: mean metric (RMSE or accuracy), std metric, mean runtime, std runtime. File seed_results.json will have results for each experiment and each seed. Additional folders will contain loss values during training.


###Examples

The following script will launch all models on House dataset.

python scripts/run.py house all

The following script will launch CatBoost and GNN models on SLAP classification dataset.

python scripts/run.py slap catboost gnn --task classification

The following script will launch LightGBM model for 5 splits of data, repeating each experiment for 3 times.

python scripts/run.py vk lightgbm --repeat_exp 3 --max_seeds 5

The following script will launch resgnn and bgnn models saving results to custom folder.

python scripts/run.py county resgnn bgnn --save_folder ./county_resgnn_bgnn

Running on your dataset

To run the code on your dataset, it's necessary to prepare the files in the right format.

You can check examples in datasets/ folder.

There should be at least X.csv (node features), y.csv (target labels), graph.graphml (graph in graphml format).

Make sure to keep these filenames for your dataset.

You can also have cat_features.txt specifying names of categorical columns.

You can also have masks.json specifying train/val/test splits.

After that run the script as usual:

python scripts/run.py path/to/your/dataset gnn catboost 

Citation

@inproceedings{
ivanov2021boost,
title={Boost then Convolve: Gradient Boosting Meets Graph Neural Networks},
author={Sergei Ivanov and Liudmila Prokhorenkova},
booktitle={International Conference on Learning Representations (ICLR)},
year={2021},
url={https://openreview.net/forum?id=ebS5NUfoMKL}
}