Scalable Knowledge Enhancement of Graph Neural Networks

This repository contains our experiments conducted with Knowledge Enhanced Neural Network [1] on the datasets ogbn-arxiv and ogbn-products from the Open Graph Benchmark [2]. In order to make KENN feasible on large graphs, we propose a graph sampling method called Restrictive Neighbourhood Sampling as graph-specific mini-batching method that allows to control the space complexity. We use Weights&Biases [3] as a tool to keep track of our experiments. Our implementation is based on the Graph Neural Network Framework PyTorch Geometric and PyTorch Scatter [4].

This is a work of the Tyrex Team. In case of comments or questions, feel free to email us.

KENN has been developed by Alessandro Daniele, Riccardo Mazzieri and Luciano Serafini.

Overview

Knowledge Enhanced Neural Networks (KENNs) [1] integrate prior knowledge in the form of logical formulas into an Artificial Neural Network by adding Knowledge Enhancement (KE) layers to the network architecture. Previous results show that the model outperforms pure neural models as well as Neural-Symbolic models on small graphs but struggle to extend to larger ones. In this paper, we address the problem of knowledge enhancement of neural networks on large graphs and carry out experiments on the Open Graph Benchmark datasets (OGB) [2]. When dealing with large graphs, we show that neighbourhood explosion occurs and makes the full-batch training of the model unfeasible. To solve this problem, we first analyse the space complexity of the knowledge enhancement layers and propose a graph-specific mini-batching strategy to make it applicable to large-scale graphs. To show that our method is effective, we test our model on two datasets from the Open Graph Benchmark datasets.

How to run the code

In order to make sure that the right environment is used, the necessary Python packages and their versions are specified in requirements.txt. We tested our implementation on Python 3.9. To install the requirements, go in the project directory and run the command

pip install -r requirements.txt

For tracking the experiments, a free Weights&Biases account is required. Sign up for Weights&Biases at https://wandb.ai/site and login with the command

wandb login

Follow the instructions in the command line.

Then, adapt the project and entity parameters in wandb.init(...) in run_experiments.py according to your project. More instructions on Weights&Biases can be found here. In the following, all conducted experiments, their results and their hyperparameters will be tracked on Weights&Biases.

To run experiments, run the following command from the project directory. The parameters specified in conf.json are used.
python run_experiments.py conf.json
To change the parameters, the conf.json file can be modified.

Parameter	Description	Default Value
datasets	Dataset to be used. Can be ogbn-products, ogbn-arxiv, Cora, CiteSeer or PubMed.	ogbn-arxiv
planetoid_split	The type of dataset split for Planetoid data sets, see [4]	public
sampling_neighbor_size	Number of neighbours to be sampled per step in sampling depth	-1 (all)
batch_size	Number of target nodes per batch	10.000
num_kenn_layers	Number of KENN layers	3
num_layers_sampling	Sampling depth, describes n-hop neighbourhood to sample from	3
hidden channels	Number of hidden units in Base NN	256
dropout	Dropout rate	0.5
lr	Learning rate	0.01
epochs	Number of epochs	300
runs	Number of independent runs	10
model	model: "GCN", "MLP", "KENN_GCN", "KENN_MLP"	"KENN_MLP"
mode	training mode: transductive or inductive	transductive
binary preactivations	Artificial preactivations of binary predicates	500.0
es_enabled	Early Stopping enabled : True or False	True
es_min_delta	Early Stopping Minimum Delta	0.001
es_patience	Early Stopping Patience	10
full_batch	Enable full-batch training: True or False	False
num_workers	Number of parallel workers for NeighborLoader	0
seed	Random Seed	0
train_sampling	How to sample the training batches: here only restrictive neighbourhood sampling	"default"
eval_steps	How often to evaluate in the training loop	10
save_data_stats	Save an overview of dataset stats as test file: True or False	False
create_kb	Create knowledge as defined in [1] if set to True, False: use knowledge_base	True
knowledge_base	Custom knowledge base	""

References

[1] A. Daniele, L. Serafini, Neural Network Enhancement with Logical Knowledge, 2020. URL: https://arxiv.org/abs/2009.06087. doi:10.48550/ ARXIV.2009.06087.

[2] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. URL: https://arxiv.org/abs/2005.00687, 2020. 10

[3] Lukas Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb.com/. Software available from wandb.com. 8.

[4] M. Fey, J. E. Lenssen, Fast graph representation learning with PyTorch Geometric, in: ICLR Work- shop on Representation Learning on Graphs and Manifolds, 2019.

We make use of the KENN implementation (in PyTorch) and the example baselines for OGB, both publicly available on GitHub. The Software of OGB and KENN is licensed.

LuisaWerner/scalable_ke

Scalable Knowledge Enhancement of Graph Neural Networks

Overview

How to run the code

References