Guiding Neural Entity Alignment with Compatibility

This repo contains the source code of paper "Guiding Neural Entity Alignment with Compatibility", which has been accepted at EMNLP 2022.

Download the used data from this Dropbox directory. Decompress it and put it under emea_code/ as shown in the folder structure below.

📌 The code has been tested. Feel free to create issues if you cannot run it successfully. Thanks!

Structure of Folders

emea_code/
  - datasets/
  - emea/
  - OpenEA/
  - scripts/
  - environment.yml
  - README.md

After you run EMEA, there will be a output/ folder which stores the evaluation results.

Device

Our experiments are run on one GPU server which is configured with 3 NVIDIA GeForce GTX 2080Ti GPUs and Ubuntu 20.04 OS. We suggest you use at least two GPUs in case of Out-Of-Memeory Issue.

Install Conda Environment

cd to the project directory first. Then, run the following command to install the major environment packages.

conda env create -f environment.yml

With the installed environment, you can run EMEA for RREA and Dual-AMN, which are SOTA neural EA models. If you also want to run EMEA for AliNet, IPTransE, which are used to verify the generality of EMEA, please also install the following packages with pip:

pip install igraph
pip install python-Levenshtein
pip install gensim
pip install dataclasses

Run Scripts

Go to the scripts folder via cd scripts/ firstly.
Before running the any script, you can modify the data_name, train_percent, initial_training and other settings in the script according to your need.
Settings about used GPU are: CUDA_VISIBLE_DEVICES="0" and tf_device="1". The former one is for EMEA, while the other one is for the neural EA model.
Starting with the default settings is a good option.
The evaluation results, including metrics of every step, are saved under the output/ directory.

Below are the scripts for different purposes:

Run sh run_emea_w_rrea.sh to reproduce results of EMEA on the three 15K datasets (i.e. zh_en, ja_en, fr_en) shown in Table 1.
Run sh run_emea_w_rrea_100k.sh to reproduce results of EMEA on the two 100K datasets (i.e. dbp_wd, dbp_yg) shown in Table 1. The source code RREA cannot run on GPU. So I run it on CPU and very long time would be taken.
Run sh run_emea_w_rrea_semi.sh to reproduce results of semi-supervised RREA on the three 15K datasets shown in Table 2.
To reproduce the generality of EMEA shown in Fig. 4, run run_emea_w_dual_amn.sh, run_emea_w_alinet.sh, run_emea_w_iptranse.sh. We suggest you try Dual-AMN model first because AliNet and IPTransE are relatively slow.
Run sh run_emea_avoidconf.sh to reproduce the results of AvoidConf rule shown in Figure 6.

Citation

Please cite this paper if you use the released code in your work.

@inproceedings{DBLP:conf/emnlp/0025SHZZZ22,
  author    = {Bing Liu and
               Harrisen Scells and
               Wen Hua and
               Guido Zuccon and
               Genghong Zhao and
               Xia Zhang},
  editor    = {Yoav Goldberg and
               Zornitsa Kozareva and
               Yue Zhang},
  title     = {Guiding Neural Entity Alignment with Compatibility},
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural
               Language Processing, {EMNLP} 2022, Abu Dhabi, United Arab Emirates,
               December 7-11, 2022},
  pages     = {491--504},
  publisher = {Association for Computational Linguistics},
  year      = {2022},
  url       = {https://aclanthology.org/2022.emnlp-main.32},
  timestamp = {Tue, 07 Feb 2023 17:10:51 +0100},
  biburl    = {https://dblp.org/rec/conf/emnlp/0025SHZZZ22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Acknowledgement

We used the source codes of RREA, Dual-AMN, OpenEA.