psunlpgroup/ReaLMistake
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
PythonNOASSERTION
This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".
PythonNOASSERTION