Please cite the following work if you find the data/code useful.
@article{yang2020heterogeneous,
title={Heterogeneous Network Representation Learning: Survey, Benchmark, Evaluation, and Beyond},
author={Yang, Carl and Xiao, Yuxin and Zhang, Yu and Sun, Yizhou and Han, Jiawei},
journal={arXiv preprint arXiv:2004.00216},
year={2020}
}
Please contact us if you have problems with the data/code, and also if you think your work is relevant but missing from the survey.
Yuxin Xiao (yuxinx2@illinois.edu), Carl Yang (yangji9181@gmail.com)
We provide 4 HIN benchmark datasets: DBLP
, Yelp
, Freebase
, and PubMed
.
Each dataset contains:
- 3 data files (
node.dat
,link.dat
,label.dat
); - 2 evaluation files (
link.dat.test
,label.dat.test
); - 2 description files (
meta.dat
,info.dat
); - 1 recording file (
record.dat
).
Please refer to the Data
folder for more details.
This stage transforms a dataset from its original format to the training input format.
Users need to specify the targeting dataset, the targeting model, and the training settings.
Please refer to the Transform
folder for more details.
We provide 11 HIN baseline implementaions:
- 5 Proximity-Preserving Methods (
metapath2vec-ESim
,PTE
,HIN2Vec
,AspEm
,HEER
); - 3 Message-Passing Methods (
R-GCN
,HAN
,HGT
); - 3 Relation-Learning Methods (
TransE
,DistMult
,ConvE
).
Please refer to the Model
folder for more details.
This stage evaluates the output embeddings based on specific tasks.
Users need to specify the targeting dataset, the targeting model, and the evaluation tasks.
Please refer to the Evaluate
folder for more details.