/AQA-Bench

Algorithmic-Q&A-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability

Primary LanguagePythonMIT LicenseMIT

AQA-Bench

Official Implementation for AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability in Algorithmic Environments

Acknowledge

This work is partially supported by a gift from Open Philanthropy. We thank the Center for AI Safety, the Microsoft Accelerate Foundation Models Research Program, the OpenAI Researcher Access Program, and the Google Cloud Research Credits Program for supporting our computing needs.