Semi-Supervised Bilingual Lexicon Induction with Two-Way Message Passing Mechanisms

In this repository, We present the implementation of our two poposed semi-supervised approches CSS and PSS for BLI.

Dependencies

You need to download the MUSE dataset from here to the ./muse_data directory.

You need to download the VecMap dataset from here to the ./vecmap_data directory.

You can run the following command to evaluate CSS on the MUSE dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-CSS-muse-en-es-5kall.yaml

You can run the following command to evaluate PSS on the VecMap dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-PSS-vecmap-en-es-5kall.yaml

Then we briefly discribe some important fields in the configuration file:

"method"" specifies the model to evaludate. "CSSBli" for CSS or "PSSBli" for PSS.
"src" and "tgt" indicate the source and target languages of BLI task.
"data_params/data_dir" specifies which dataset to use where "./muse_data/" for MUSE or "./vecmap_data/" for VevMap.
"supervised/max_count" indicates the size of annotated lexicon where "-1" for "5k all", "100" for "100 unique" and "5000" for "5000 unique".

Other fields specify the hyperparameters for CSS and PSS.