/OAG

Source code and dataset for KDD 2019 paper "OAG: Toward Linking Large-scale Heterogeneous Entity Graphs"

Primary LanguagePythonMIT LicenseMIT

OAG

OAG: Toward Linking Large-scale Heterogeneous Entity Graphs.
Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, and Kuansan Wang.
In KDD 2019 (Applied Data Science Track)

Prerequisites

  • Linux or macOS
  • Python 3
  • TensorFlow GPU >= 1.14
  • NVIDIA GPU + CUDA cuDNN

Getting Started

Installation

Clone this repo.

git clone https://github.com/THUKG/OAG
cd OAG

Please install dependencies by

pip install -r requirements.txt

Dataset

The dataset can be downloaded from OneDrive, Tsinghua Cloud or BaiduPan (with password gzpp). Unzip the file and put the data directory into project directory.

How to run

cd $project_path
export PYTHONPATH="$project_path:$PYTHONPATH"
export CUDA_VISIBLE_DEVICES='?'  # specify which GPU(s) to be used
cd core

# venue linking
python rnn/train.py

# paper linking
### LSH method
python hash/title2vec.py  # train doc2vec model
python hash/hash.py
### CNN method
python cnn/train.py

# author linking
python gat/preprocessing.py
python gat/train.py

Cite

Please cite our paper if you use this code in your own work:

@inproceedings{zhang2019oag,
  title={OAG: Toward Linking Large-scale Heterogeneous Entity Graphs.},
  author={Zhang, Fanjin and Liu, Xiao and Tang, Jie and Dong, Yuxiao and Yao, Peiran and Zhang, Jie and Gu, Xiaotao and Wang, Yan and Shao, Bin and Li, Rui and Wang, Kuansan},
  booktitle={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’19)},
  year={2019}
}