Arena-Hard-Auto: An automatic LLM benchmark.
Primary LanguageJupyter NotebookApache License 2.0Apache-2.0