/SemiSupBLI

The implementation of the paper accepted by EMNLP2020

Primary LanguagePython

Semi-Supervised Bilingual Lexicon Induction with Two-Way Message Passing Mechanisms

In this repository, We present the implementation of our two poposed semi-supervised approches CSS and PSS for BLI.

Dependencies

  • python 3.7
  • Pytorch
  • Numpy
  • Faiss

How to get the datasets

You need to download the MUSE dataset from here to the ./muse_data directory.

You need to download the VecMap dataset from here to the ./vecmap_data directory.

How to run

You can run the following command to evaluate CSS on the MUSE dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-CSS-muse-en-es-5kall.yaml

You can run the following command to evaluate PSS on the VecMap dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-PSS-vecmap-en-es-5kall.yaml

Configuration

Then we briefly discribe some important fields in the configuration file:

  • "method"" specifies the model to evaludate. "CSSBli" for CSS or "PSSBli" for PSS.
  • "src" and "tgt" indicate the source and target languages of BLI task.
  • "data_params/data_dir" specifies which dataset to use where "./muse_data/" for MUSE or "./vecmap_data/" for VevMap.
  • "supervised/max_count" indicates the size of annotated lexicon where "-1" for "5k all", "100" for "100 unique" and "5000" for "5000 unique".

Other fields specify the hyperparameters for CSS and PSS.