==============================
Code for paper "Gromov Wasserstein Alignment of Word Embedding Spaces".
Disclaimer: This codebase borrows some embbedding and evaluation tools from Mikel Artetxe's vecmap repo, and relies on the Gromov-Wasserstein implementation of the Python Optimal Transport POT from Remi Flamary and colleagues.
- tqdm
- matplotlib
It's highly recommended that the following steps be done inside a virtual environment (e.g., via virtualenv
or anaconda
).
Install this package
git clone git@github.com:dmelis/otalign.git
cd otalign
pip3 install -e ./
Data for the 'Conneau' task can be obtained via the MUSE repo, and data for the 'Dinu' task can be obtained via the VecMap repo.
Copy data to local dirs (alternatively, the paths can be explicitly provided via arguments).
cp -r /path/to/MUSE/dir/data/* ./data/raw/MUSE/
cp -r /path/to/dinu/dir/data/* ./data/raw/dinu/
python scripts/main_gw_bli.py --task conneau --src en --trg es --maxiter 50
TODO: POT recently moved from cudamat to cupy for GPU comptuation, which broke this code. It can currently be run on small subsets of the tasks, but will need to fix CUDA dependencies to solve full problems.