huggingface/evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Jupyter NotebookNOASSERTION
Stargazers
- aflah02Max Planck Institute for Software Systems: MPI SWS
- ajay-sreeramTarget
- amarrella@bandstandsoftware
- atasoglunear the Sun
- baelloufparis
- catastropiyushCSIR NCL
- chainyo@owkin
- chuanmingliuWesteros
- dkaptGreece
- dnhkngMunich, Germany
- drvenabiliiguanodon.ai
- eduardovelosoTake Blip
- fgbelidjiHugging Face
- gm8xx8
- gonz-mart
- imr555Neovotech
- ipruningEdinburgh, UK ⇌ Shanghai, China
- jplasserLinz, Austria
- kgourgou
- lantzk
- lewtun@huggingface
- MihaiiiRomania
- MikeBirdTechTool Use
- MoritzLaurer@HuggingFace
- mukulpatnaik@portalcorp
- MurgioETH Zürich
- NathanHBParis
- omkar-334Hyderabad
- othmaneabou@DataDog
- SaiNikhileshReddy@WadhwaniAI
- SinclairCoderChina
- soumyasjTübingen, Germany
- steve-jarrettParis, France
- Vaibhavs10@huggingface
- vvonchain
- younesbelkada@huggingface