chandar-lab/EpiK-Eval
Benchmark to evaluate the capability of language models to consolidate and recall information from multiple training documents.
PythonMIT
Benchmark to evaluate the capability of language models to consolidate and recall information from multiple training documents.
PythonMIT