/llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Primary LanguageJupyter Notebook

Watchers

No one’s watching this repository yet.