ai-benchmark
There are 5 repositories under ai-benchmark topic.
microsoft/WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
TheAgentCompany/TheAgentCompany
An agent benchmark with tasks in a simulated software company.
kaykycampos/gta-benchmark
GTA (Guess The Algorithm) Benchmark - A tool for testing AI reasoning capabilities
Habitante/gta-benchmark
GTA (Guess The Algorithm) Benchmark - A tool for testing AI reasoning capabilities
petmal/MindTrial
MindTrial: Evaluate and compare AI language models (LLMs) on text-based tasks with optional file/image attachments. Supports multiple providers (OpenAI, Google, Anthropic, DeepSeek, Mistral AI, xAI), custom tasks in YAML, and HTML/CSV reports.