This repository links to the datasets, models and code for evaluating composition models for VP-elliptical sentences, as described in the article
Gijs Wijnholds and Mehrnoosh Sadrzadeh. Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings. NAACL-HLT 2019.
If you find any of this useful, please consider citing our paper as
@inproceedings{wijnholds2019evaluating,
title = "Evaluating Composition Models for Verb Phrase Elliptical Sentence Embeddings",
author = "Gijs Wijnholds and Mehrnoosh Sadrzadeh",
year = "2019",
booktitle={Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)},
publisher={Association for Computational Linguistics}
}
We provide two new datasets, extending the verb disambiguation dataset of Grefenstette & Sadrzadeh 2011 and the transitive sentence similarity dataset of Kartsaklis & Sadrzadeh 2013.
This dataset extends the verb disambiguation dataset of Grefenstette & Sadrzadeh 2011 to VP-elliptical settings. Link
This dataset extends the verb disambiguation dataset of Kartsaklis & Sadrzadeh 2013 to VP-elliptical settings. Link
We provide four trained vector spaces, following several popular embedding methods. For each of the vector spaces, we also provide a separate tensor space, containing learned matrices for 85 verbs that occur in the evaluation datasets. The tensors are presented in a flattened format, so they need to be reshaped to size (d, d) for d the dimension of the corresponding vector space.
Model Name | Dimensions | Vectors | Tensors |
---|---|---|---|
count | 2000 | link | link |
word2vec | 300 | link | link |
glove | 300 | link | link |
fasttext | 300 | link | link |
We provide some code for evaluating the vector space models on the new datasets. This can be found in my main code repository for evaluation of compositional distributional semantics here