geometric-embedding-properties

Source code and detailed results for

Whitaker et al, "Characterizing the impact of geometric properties of word embeddings on task performance." In Proceedings of RepEval 2019.

This code is released under MIT License. If you use it in your own work, please cite the following paper:

@inproceedings{Whitaker2019RepEval,
  author = {Whitaker, Brendan and Newman-Griffis, Denis and Haldar, Aparajita and Ferhatosmanoglu, Hakan and Fosler-Lussier, Eric},
  title = {Characterizing the impact of geometric properties of word embeddings on task performance},
  booktitle = {Proceedings of the Third Workshop on Evaluating Vector Space Representations for NLP (RepEval)},
  year = {2019}
}

Implementations

This repository includes implementations of the embedding transformation methods described in the above paper. They are broken down into three modules:

affine - Implementations of affine transformations. For more details, see specific README.
CDE - Implementation of cosine distance encoding (CDE) transformation. For more details, see specific README.
NNE - Implementation of nearest neighbor encoding (NNE) transformations. For more details, see specific README.

Evaluation tasks

For evaluation tasks, we relied on two other repositories:

kudkudak/word-embeddings-benchmarks for intrinsic evaluations
drgriffis/Extrinsic-Evaluation-tasks for extrinsic evaluations

Data

Our full tables of results are included in the detailed-results directory. This includes separate files for intrinsic and extrinsic tasks for each set of word embeddings used.

The reference word embeddings we used are linked below:

Word2Vec - 300-d GoogleNews embeddings
GloVe - 300-d embeddings from 840B Common Crawl
FastText - 300-d Wikipedia/UMBC/StatMT embeddings with subword information

OSU-slatelab/geometric-embedding-properties

geometric-embedding-properties

Implementations

Evaluation tasks

Data