The IMPLI Dataset
This repository contains the data collected as part of the IMPLI dataset, to be published at ACL 2022.
This dataset consists of both semi-supervised and gold annotated pairs. These pairs consist of a literal sentence and a figurative counterpart. The pairs are designed to be either entailing or non-entailing, providing a rich, diverse set of paired data to explore figurative language.
The idioms and metaphors folders each contain files of paired sentences. These are grouped by the type of collection used, the dataset, and the intended entailment relation. Each is in .tsv format, with the first column being the context and the second the hypothesis (some also contain scores for metaphoricity from the original datasets as the third column).
Collection Types
- adversarial_definition: Annotators created hand-crafted adversarial definitions of idioms, which were then slotted into relevant contexts, yielding non-entailing pairs.
- fig_context: Dictionary definitions were substituted into figurative contexts, yielding entailment pairs.
- lit_context: Dictionary definitions were substituted into literal contexts, yielding non-entailment pairs.
- replacement: Annotators generated literal paraphrases for metaphoric constructions, which were replaced into figuraitve contexts, yielding entailments
- manual: Annotators manually wrote entailing and non-entailing pairs.
Datasets
For the automatic replacements, we started with sentences from other datasets, replacing key components to yield pairs.
- magpie: The MAGPIE idiom dataset
- pie: The PIE idiom dataset
- semeval: The SemEval idiom dataset (Task 5)
Relations
Files ending in _e.tsv are intended to contain entailing relations between the context and hypothesis; files ending in _ne.tsv are intended to contain non-entailing pairs.
For questions/comments/suggestions, please contact Kevin Stowe:
kevincstowe@gmail.com
Disclaimer
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Citation
If you use the IMPLI dataset, please cite the following publication, to appear at ACL 2022:
@inproceedings{stowe-2022,
title = "IMPLI: Investigating NLI Models' Performance on Figurative Language",
author = "Stowe, Kevin and Utama, Prasetya Ajie, and Gurevytch, Iryna",
booktitle = "Proceedings of the 2022 Conference for the Association of Computational Linguistics",
month = "06",
year = "2022",
publisher = "Association for Computational Linguistics",
url = "tbd",
}