chandar-lab/EpiK-Eval

Benchmark to evaluate the capability of language models to consolidate and recall information from multiple training documents.

PythonMIT

Stargazers