carlini/yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

PythonGPL-3.0

Readme
14Issues
936Stargazers
17Watchers

Watchers

aviv12825
carlini
cynepiaadmin
Cynepia Technologies
drkostas
University of Tennessee, Knoxville
dwindibank
Waterloo, ON
eemailme
HashmatShadab
Abu Dhabi, UAE
HuangXihuang
Xiamen,China
katelee168
kathakali
melindadevins
Visla.us
runrunliuliu
sankeerthrao
Google Research
shimomurakei
shock
Austin, TX
srxzr
trappedinspacetime
For Personal Use

Contact site admin: Geeks.