/Cheating-LLM-Benchmarks

[SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Primary LanguageJupyter NotebookMIT LicenseMIT

Issues