Code for Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies by Tom Kocmi, Vilém Zouhar, Christian Federmann, and Matt Post.
@misc{kocmi2024navigating,
title={Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies},
author={Tom Kocmi and Vilém Zouhar and Christian Federmann and Matt Post},
year={2024},
eprint={2401.06760},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
See the MT thresholds tool.
pip3 install mt-thresholds
# accuracy is 63.989%
mt-thresholds bleu 1.00
# ChrF needs 0.710 difference for the same accuracy as BLEU
mt-thresholds chrf 0.63989 --delta
Or use from Python:
import mt_thresholds
mt_thresholds.accuracy(1.0, "bleu") # 0.63989
mt_thresholds.delta(0.63989, "chrf") # 0.665
We plan to release the code for replicating WMT results in upcoming months.