tangled

Code, data, and additional analysis for the paper Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics

https://www.aclweb.org/anthology/2020.acl-main.448/

Layout

Data: contains files with WMT19 system-level metric and human scores

top-n: contains a pdf for the figures of top-n vs rolling window method of subsampling for all language pairs, as described in section 4.1 of the paper.

Outliers: Code to compute correlations with and without outliers, as described in section 4.2 of the paper

nitikam/tangled

tangled

Layout