vectara/hallucination-leaderboard

Can we reproduce the leadersboard?

Closed this issue · 1 comments

Is the code and data to reproduce the leadersboard released?
e.g. if we have a model/method under development and would want to assess its tendency to hallucinate under RAG conditions, how can your leadersboard be used?

Hello Amir, and thank you for writing.

The model used to produce the leaderboard is available on HuggingFace: Hughes Hallucination Evaluation Model (HEM).

Beyond that, take a look at leaderboard_summaries.csv.

if we have a model/method under development and would want to assess its tendency to hallucinate under RAG conditions, how can your leadersboard be used?

My recommendation is to use the HEM model to directly assess your model's tendency to hallucinate. Focus on using a set of queries and documents that are realistic for the use case you want to support.

Please let me know if anything is not clear.