vectara/hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

PythonApache-2.0

Issues

Reproducing HF model Summaries
#80 opened a month ago by Noor-Nizar
1
Claude 3.5 Sonnet New
#77 opened 2 months ago by sushantnair
2
Any arxiv paper or report?
#24 opened a year ago by zhimin-z
2
How can I use the HHEM model to evaluate my LLM after finetuning?
#50 opened 6 months ago by zjq0455
1
Would you please provide a citation bibtex?
#26 opened a year ago by JacksonWuxs
2
Can we reproduce the leadersboard?
#21 opened a year ago by amir-abdi
1
Google PaLM reference
#8 opened a year ago by suddhasatwabhaumik
2
Can you add a CITATION.cff for easy citation?
#19 opened a year ago by sigjhl
1
instance level metric outputs
#6 opened a year ago by cabreraalex
4
Integrate with LiteLLM - Evaluate 100+LLMs, 92% faster
#1 opened a year ago by ishaan-jaff
1
Claude 2.1 Benchmark Missing
#16 opened a year ago by lukestanley
1
Generation parameters
#9 opened a year ago by qmdnls
1
Any date on releasing the training script for the model?
#2 opened a year ago by deshwalmahesh
2
how did you determine what is factually correct ?
#4 opened a year ago by listaction
2
Google Palm API version?
#5 opened a year ago by zizhaozhang
1
GPT 4-Turbo
#3 opened a year ago by orionsolidified
1