Data repository for our paper ""My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models"
This repo contains the annotated data we used for training our evaluator in labeled_model_output
, and the model output with mapping result in outputs
.
We also released the classifiers we trained on huggingface. Please try them out.
If you find this repository useful or our work is related to your research, please kindly cite it:
@article{wang2024my,
title={" My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models},
author={Wang, Xinpeng and Ma, Bolei and Hu, Chengzhi and Weber-Genzel, Leon and R{\"o}ttger, Paul and Kreuter, Frauke and Hovy, Dirk and Plank, Barbara},
journal={arXiv preprint arXiv:2402.14499},
year={2024}
}