/RE-Net

Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs (EMNLP 2020)

Primary LanguagePython

PyTorch implementation of Recurrent Event Network (RE-Net)

Paper: Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs

TL;DR: We propose an autoregressive model to infer graph structures at unobserved times on temporal knowledge graphs (extrapolation problem).

This repository contains the implementation of the RE-Net architectures described in the paper.

Knowledge graph reasoning is a critical task in natural language processing. The task becomes more challenging on temporal knowledge graphs, where each fact is associated with a timestamp. Most existing methods focus on reasoning at past timestamps, which are not able to predict facts happening in the future. This paper proposes Recurrent Event Network (RE-Net), a novel autoregressive architecture for predicting future interactions. The occurrence of a fact (event) is modeled as a probability distribution conditioned on temporal sequences of past knowledge graphs. Specifically, our RE-Net employs a recurrent event encoder to encode past facts, and uses a neighborhood aggregator to model the connection of facts at the same timestamp. Future facts can then be inferred in a sequential manner based on the two modules. We evaluate our proposed method via link prediction at future times on five public datasets. Through extensive experiments we demonstrate the strength of RE-Net, especially on multi-step inference over future time stamps, and achieve state-of-the-art performance on all five datasets.

If you make use of this code or the RE-Net algorithm in your work, please cite the following paper:

@inproceedings{jin2020Renet,
	title={Recurrent Event Network: Autoregressive Structure Inference over Temporal Knowledge Graphs},
	author={Jin, Woojeong and Qu, Meng and Jin, Xisen and Ren, Xiang},
	booktitle={EMNLP},
	year={2020}
}

Quick Links

Installation (LEGACY)

Run the following commands to create a conda environment (assume CUDA10.1):

conda create -n renet python=3.6 numpy
conda activate renet
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
conda install -c dglteam "dgl-cuda10.1<0.5"

Installation (AWS)

Run the following commands to create a conda environment (assume CUDA10.1):

conda create --name pytorch_p37_renet --clone pytorch_p37
source activate pytorch_p37_renet
pip install dgl-cu110 -f https://data.dgl.ai/wheels/repo.html

Train and Test

In this code, RE-Net with RGCN aggregator is included. Before running, the user should preprocess datasets.

cd data/DATA_NAME
python3 get_history_graph.py

We first pretrain the global model.

python3 pretrain.py -d DATA_NAME --gpu 0 --dropout 0.5 --n-hidden 200 --lr 1e-3 --max-epochs 20 --batch-size 1024

Then, train the model.

python3 train.py -d DATA_NAME --gpu 0 --dropout 0.5 --n-hidden 200 --lr 1e-3 --max-epochs 20 --batch-size 1024

We are ready to test!

python3 test.py -d DATA_NAME --gpu 0 --n-hidden 200

Predict (produce a prediction file for all subjects in the test file)

python3 predict.py -d DATA_NAME --gpu 0 --n-hidden 200

The default hyperparameters give the best performances.

Related Work

Our work is on an extrapolation problem. There are only a few work on the problem. Many studies on temporal knowledge graphs are focused on an intrapolation problem. We organized the list of related work such as Temporal Knowledge Graph Reasoning, Dynamic Graph Embedding, Knowledge Graph Embedding, and Static Graph Embedding.

Datasets

There are four datasets: ICEWS18, ICEWS14 (from Know-Evolve), GDELT, WIKI, and YAGO. These datasets are for the extrapolation problem. Times of test set should be larger than times of train and valid sets. (Times of valid set also should be larger than times of train set.) Each data folder has 'stat.txt', 'train.txt', 'valid.txt', 'test.txt',and 'get_history_graph.py'.

  • 'get_history_graph.py': This is for getting history and graph for the model.
  • 'stat.txt': First value is the number of entities, and second value is the number of relations.
  • 'train.txt', 'valid.txt', 'test.txt': First column is subject entities, second column is relations, and third column is object entities. The fourth column is time. The fifth column is for know-evolve's data format. It is ignored in RE-Net.

For relation names in GDELT, please refer to the GDLET codebook.

Baselines

We use the following public codes for baselines and hyperparameters. We validated embedding sizes among presented values.

Baselines Code Embedding size Batch size
TransE (Bordes et al., 2013) Link 100, 200 1024
DistMult (Yang et al., 2015) Link 100, 200 1024
ComplEx (Trouillon et al., 2016) Link 50, 100, 200 100
RGCN (Schlichtkrull et al., 2018) Link 200 Default
ConvE (Dettmers et al., 2018) Link 200 128
Know-Evolve (Trivedi et al., 2017) Link Default Default
HyTE (Dasgupta et al., 2018) Link 128 Default

We implemented TA-TransE, TA-DistMult, and TTransE. The user can run the baselines by the following command.

cd ./baselines
CUDA_VISIBLE_DEVICES=0 python3 TA-TransE.py -f 1 -d ICEWS18 -L 1 -bs 1024 -n 1000

The user can find implementations in the 'baselines' folder.