aws-banjo/llm-colosseum
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Jupyter NotebookMIT
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
Jupyter NotebookMIT