/MuGNN

Source code for ACL2019 paper "Multi-Channel Graph Neural Network for Entity Alignment".

Primary LanguagePython

Multi-Channel Graph Neural Network for Entity Alignment

The code of ACL2019 paper "Multi-Channel Graph Neural Network for Entity Alignment". The paper could be found at here.

Dependencies

Note: tensorboardX needs a TensorFlow installation to work correctly.

Code

Demo

To run a demo, simply execute the following script:

>> python example_train.py [GPU_id (if available)]
# example
>> python example_train.py 0

Customized running

To run the code on your own dataset:

  1. Format your data as described in Datasets;

  2. Execute rule mining with AMIE+;

    >> python format_data.py [PATH_TO_YOUR_DATASET]
    # example
    >> python format_data.py ./bin/DBP15k/fr_en

    Note: AMIE+ runs as an independent JAVA program. So you will need to wait until AMIE+ ended, and then input "amie ended" at the prompt to inform the python program to execute the next step.

  3. Customize your running

    • Customization with config.py

      from config import Config
      config = Config()
    • Set the hyper-parameters

      config.set_cuda(True) # set train on cpu or gpu
      config.set_dim(128) # set dimension number of embeddings and weight matrices
      config.set_align_gamma(1.0) # set gamma_1
      config.set_rel_align_gamma(1.0) # set gamma_2
      config.set_rule_gamma(0.12) # set gamma_r
      config.set_num_layer(2) # set layer number of MuGNN
      config.set_dropout(0.2) # set dropout rate
      config.set_learning_rate(0.001) # set learning rate
      config.set_l2_penalty(1e-2) # set L2 regularization coefficient
      config.set_update_cycle(5) # set negative sampling frequency
    • Set your dataset path

      config.init(YOUR_DATASET_PATH)
      # example
      config.init('./bin/DBP15k/fr_en')
    • Set log path

      config.init_log(LOG_FILE_PATH)
      # example
      config.init_log('./log/test')
    • Train

      config.train()

If you have any difficulties and questions regarding running the code, feel free to create an issue.

Datasets

Folder ./bin contains DBP15k and DWY100k datasets.

Directory structure

DBP15k/
      kg1_kg2/
            entity2id_kg1.txt
            entity2id_kg2.txt
            relation2id_kg1.txt
            relation2id_kg2.txt
            triples_kg1.txt
            triples_kg2.txt
            relation_seeds.txt
            entity_seeds.txt
            AMIE/
                  all2id_kg1.txt
                  all2id_kg2.txt
                  triples_kg1.txt
                  triples_kg2.txt
DWY100k/
      kg1_kg2/
            entity2id_kg1.txt
            entity2id_kg2.txt
            relation2id_kg1.txt
            relation2id_kg2.txt
            triples_kg1.txt
            triples_kg2.txt
            relation_seeds.txt
            train_entity_seeds.txt
            test_entity_seeds.txt
            AMIE/
                  all2id_kg1.txt
                  all2id_kg2.txt
                  triples_kg1.txt
                  triples_kg2.txt

Data format

  • entity2id_kgx.txt: all entities from kgx with the corresponding ids. Format: entity_name + \t + id + \n;
  • relation2id_kgx.txt: all relations from kgx with the corresponding ids. Format: relation_name + \t + id + \n;
  • triples_kgx.txt: all triples from kgx. Format: entity1 + \t + entity2 + \t + relation + \n;
  • entity_seeds.txt: all entity seed alignments. Format: entity1 (from kg1) + \t + entity2 (from kg2) + \n;
  • train_entity_seeds.txt: entity seed alignments for training. Format: entity1 (from kg1) + \t + entity2 (from kg2) + \n;
  • test_entity_seeds.txt: entity seed alignments for test. Format: entity1 (from kg1) + \t + entity2 (from kg2) + \n;
  • relation_seeds.txt: all relation seed alignments. Format: relation1 (from kg1) + \t + relation2 (from kg2) + \n;
  • all2id_kgx.txt: all entities and relations from kgx with the corresponding ids. Format: entity/relation + \t + id + \n.

Note

  • The difference between arrangements of DBP15k and DWY100k is that DWY100k has split the train and test set of entity alignments but DBP15k has not;
  • Folder AMIE contains data arranged in the structure which is designed to be compatible with AMIE+;
  • triples_kgx.txt in folder AMIE is encoded with all2id_kgx.txt.

Reference

If you use the code, please cite our paper:

@inproceedings{cao2019muti,
  title={Multi-Channel Graph Neural Network for Entity Alignment},
  author={Cao, Yixin and Liu, Zhiyuan and Li, Chengjiang and Liu, Zhiyuan and Li, Juanzi and Chua, Tat-Seng},
  booktitle={ACL},
  year={2019}
}

Acknowledgement

This research is supported by the National Research Foundation, Singapore under its International Research Centres in Singapore Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.