Graph convolutional networks

Overview

My attempt to reproduce graph classification results from recent papers [1, 2] using Graph U-Net. So far, my results using Graph U-Net are worse than the baseline (GCN). I also compare to a recent work on Multigraph GCN (MGCN) [4].

This repository contains all necessary data for the PROTEINS dataset. It can be found here along with similar datasets.

The baseline model is Graph Convolutional Network (GCN) [3]. The decoder part of Graph U-Net is not implemented yet in our code, i.e. the only difference with the baseline is using pooling based on dropping nodes between graph convolution layers.

Hyperparameters are taken from [2], but learning rate decay and dropout is also applied. The readout layer (last pooling layer over nodes) is also simplified to just max pooling over nodes. All hyperparameters are the same for the baseline, Graph U-Net and Multigraph GCN (MGCN).

Implementation is very basic without much optimization, so that it is easier to debug and play around with the code.

python graph_unet.py --model gcn  # to run baseline GCN
python graph_unet.py --model unet  # to run Graph U-Net
python graph_unet.py --model mgcn  # to run Multigraph GCN

To use the PyTorch Geometric data loader, add flag --torch-geom.

Repeating 10 times for different seeds:

for i in $(seq 1 10); do seed=$(( ( RANDOM % 10000 )  + 1 )); python graph_unet.py --model gcn --seed $seed | tee logs/gcn_proteins_"$i".log; done

Then reading log files can be done as following:

results_dir = './logs'
acc = []
for f in os.listdir(results_dir):
    with open(pjoin(results_dir, f), 'r') as fp:
        s = fp.readlines()[-1]        
    pos1 = s.find(':')
    acc.append(float(s[pos1+1:s[pos1:].find('(') + pos1]))
print(len(acc), np.mean(acc), np.std(acc))

Results

Average and std of accuracy for 10-fold cross-validation. We also repeat experiments 10 times (as shown above) for different random seeds and report average and std over those 10 times.

Model	PROTEINS	PROTEINS (10 times)
GCN [3]	74.71 ± 3.44*	74.37 ± 0.31
GCN [3] + A²	74.36 ± 4.57	74.56 ± 0.26
GCN [3] + A² + 2I	74.45 ± 4.91	74.23 ± 0.37
Graph U-Net [1, 2]	72.39 ± 3.34	72.45 ± 0.88
Graph U-Net [1, 2] + A²	72.90 ± 4.08	72.87 ± 0.52
Graph U-Net [1, 2] + A² + 2I	73.63 ± 4.67	73.18 ± 0.50
Multigraph GCN (MGCN) [4]	74.62 ± 2.56	75.56 ± 0.27

*74.72 ± 2.90 with PyTorch 1.0.0.

Some datasets contain additional float-valued node attributes, which can improve graph classification a lot. Note that some algorithms, including Weisfeiler-Lehman (WL) Graph Kernels, are not able to make use of these additional attributes, so algorithms should be compared fairly.