Using NLI (Roberta from HuggingFace) for data-to-text NLG evaluation. Tested on WebNLG and E2E datasets.
This code accompanies the following paper:
Ondřej Dušek & Zdeněk Kasner (2020): Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference. In Proceedings of INLG.
The code requires Python 3 and some additional packages (most importantly Transformers, see requirements.txt
). To install, clone this repository and then run:
pip install -r requirements.txt
We recommend to use a virtualenv for the installation.
Run ./nli_eval.py -h
for a list of available options.
Basic usage (default settings):
./nli_eval --type <type> input.tsv output.json
The type is either webnlg
or e2e
, based on the domain of the data.
For WebNLG, we used the 2017 human evaluation results file. Run data/download_webnlg.sh
to download it.
E2E data are taken from the primary systems, then concatenated and processed with the slot error script. The result is stored here in data/e2e.tsv
for simplicity.
To use this on a different dataset, you need to provide new templates for all predicates and link them to TEMPLATE_PATHS
in the code. You also need to implement a data loading routine such as parse_e2e
or parse_webnlg
.
@inproceedings{dusek_evaluating_2020,
address = {Online},
title = {Evaluating {Semantic} {Accuracy} of {Data}-to-{Text} {Generation} with {Natural} {Language} {Inference}},
booktitle = {Proceedings of the 13th {International} {Conference} on {Natural} {Language} {Generation} ({INLG} 2020)},
author = {Dušek, Ondřej and Kasner, Zdeněk},
url = {https://www.aclweb.org/anthology/2020.inlg-1.19},
pages = {131--137},
month = dec,
year = {2020},
}