BENCHY

Benchmarks you can feel

We all love benchmarks, but there's nothing like a hands on vibe check. What if we could meet somewhere in the middle?

Enter BENCHY. A chill, live benchmark tool that lets you see the performance, price, and speed of LLMs in a side by side comparison for SPECIFIC use cases.

Watch the walk through video here

Live Benchmark Tools

Long Tool Calling
- Goal: Understand the best LLMs and techniques for LONG chains of tool calls / function calls (15+).
- Watch the walk through video here
Multi Autocomplete
- Goal: Understand claude 3.5 haiku & GPT-4o predictive outputs compared to existing models.
- Watch the walk through video here

Important Files

.env - Environment variables for API keys
server/.env - Environment variables for API keys
package.json - Front end dependencies
server/pyproject.toml - Server dependencies
src/store/* - Stores all front end state and prompt
src/api/* - API layer for all requests
server/server.py - Server routes
server/modules/llm_models.py - All LLM models
server/modules/openai_llm.py - OpenAI LLM
server/modules/anthropic_llm.py - Anthropic LLM
server/modules/gemini_llm.py - Gemini LLM

Setup

Get API Keys

Client Setup

# Install dependencies using bun (recommended)
bun install

# Or using npm
npm install

# Or using yarn
yarn install

# Start development server
bun dev  # or npm run dev / yarn dev

Server Setup

# Move into server directory
cd server

# Create and activate virtual environment using uv
uv sync

# Set up environment variables
cp .env.sample .env

# Set EVERY .env key with your API keys and settings
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GEMINI_API_KEY=

# Start server
uv run python server.py

# Run tests
uv run pytest (**beware will hit APIs and cost money**)

Dev Notes & Caveats

See src/components/DevNotes.vue for limitations

disler/benchy

BENCHY

Live Benchmark Tools

Important Files

Setup

Get API Keys

Client Setup

Server Setup

Dev Notes & Caveats

Resources