carlini/yet-another-applied-llm-benchmark
A benchmark to evaluate language models on questions I've previously asked them to solve.
PythonGPL-3.0
Stargazers
- achermINSA Rennes, IRISA, Inria
- aflah02Max Planck Institute for Software Systems: MPI SWS
- ahammadnafiz
- andrewncgetcartwheel.com
- atg-abhishekMAIEI | BCG | GSF
- BedirTFathomAI
- bryanyzhuAmazon AI
- cern1710United Kingdom
- cheginitPurdue University
- DavidLee528Duke University
- davisrbr
- deepakn97University of California, Santa Barbara
- dpaleka
- FramartinParameter Lab
- grahamannett
- hammaad2002Karachi, Pakistan
- iamgroot42Northeastern University
- jiahaolu97National University of Singapore <- MSRA <- Institute of Automation, Chinese Academy of Sciences
- johnnynunezBarcelona
- karpathyStanford
- lightaimeCAMEL-AI.org
- lukaemonWA
- mahdiabdollahpourToronto
- marathan24
- MayurjiChennai
- munozariasjm
- proshianITMO University
- rohan-paulhttps://www.linkedin.com/in/rohan-paul-ai
- SeungoneKimCarnegie Mellon University
- tachen-csPittsburgh
- ugorsahin
- vinothpandian@thomsonreuters
- vishaal27University of Tübingen | University of Cambridge
- yibityibit
- zhiqLiminal Produkt
- ZiyueWang25Google