/llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

Primary LanguageJupyter NotebookMIT LicenseMIT

Issues