/arena-hard-auto

Arena-Hard-Auto: An automatic LLM benchmark.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Watchers