/umwe

Project

Primary LanguageJupyter Notebook

Learning Multilingual Word Embeddings without Cross-Lingual Supervision (A Project for COMPSCI 585 and 682) Refer to paper: https://arxiv.org/pdf/1808.08933.pdf

Data Sources:

FastText: https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md

Pytorch Embeddings in https://drive.google.com/drive/u/1/folders/1LSJc_dNm8nveXBAP7e_ksSDjnmGU849S

SemEval 17: http://alt.qcri.org/semeval2017/task2/data/uploads

Translation Task: https://github.com/facebookresearch/MUSE

To run the code, you need to download FastText embeddings for the languages you want from the link above in a folder called "wordvecs" and then simply run the umwe.py script.