carlini/yet-another-applied-llm-benchmark
A benchmark to evaluate language models on questions I've previously asked them to solve.
PythonGPL-3.0
Watchers
- aviv12825
- carlini
- cynepiaadminCynepia Technologies
- drkostasUniversity of Tennessee, Knoxville
- dwindibankWaterloo, ON
- eemailme
- HashmatShadabAbu Dhabi, UAE
- HuangXihuangXiamen,China
- katelee168
- kathakali
- melindadevinsVisla.us
- runrunliuliu
- sankeerthraoGoogle Research
- shimomurakei
- shockAustin, TX
- srxzr
- trappedinspacetimeFor Personal Use