/ReaLMistake

This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".

Primary LanguagePythonOtherNOASSERTION

Watchers