jsbaan/calibration-on-disagreement-data

Code accompanying the EMNLP 2022 paper "Stop Measuring Calibration When Humans Disagree" in which we show problems with popular calibration metrics like ECE in settings where more than one answer is acceptable, and argue for several metrics that take into account the full human judgement distribution.

Jupyter Notebook

Stargazers

glushkovato
Instituto Superior Técnico
Kaleidophon
Copenhagen
Mckysse
LMU Munich
wilkeraziz
University of Amsterdam