aws-banjo/llm-colosseum
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Jupyter NotebookMIT
No issues in this repository yet.
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Jupyter NotebookMIT
No issues in this repository yet.