compdisteval-ellipsis

This repository links to the datasets, models and code for evaluating composition models for VP-elliptical sentences, as described in the article

Gijs Wijnholds and Mehrnoosh Sadrzadeh. Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings. NAACL-HLT 2019.

If you find any of this useful, please consider citing our paper as

@inproceedings{wijnholds2019evaluating,
  title = "Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings",
  author = "Gijs Wijnholds and Mehrnoosh Sadrzadeh",
  year = "2019",
  booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
  publisher={Association for Computational Linguistics}
}

Datasets

We provide two new datasets, extending the verb disambiguation dataset of Grefenstette & Sadrzadeh 2011 and the transitive sentence similarity dataset of Kartsaklis & Sadrzadeh 2013.

ELLDIS

This dataset extends the verb disambiguation dataset of Grefenstette & Sadrzadeh 2011 to VP-elliptical settings. Link

ELLSIM

This dataset extends the verb disambiguation dataset of Kartsaklis & Sadrzadeh 2013 to VP-elliptical settings. Link

Models

We provide four trained vector spaces, following several popular embedding methods. For each of the vector spaces, we also provide a separate tensor space, containing learned matrices for 85 verbs that occur in the evaluation datasets. The tensors are presented in a flattened format, so they need to be reshaped to size (d, d) for d the dimension of the corresponding vector space.

Model Name	Dimensions	Vectors	Tensors
count	2000	link	link
word2vec	300	link	link
glove	300	link	link
fasttext	300	link	link

Code

We provide some code for evaluating the vector space models on the new datasets. This can be found in my main code repository for evaluation of compositional distributional semantics here