A transparent, secure proxy for OpenAI's API with token management, rate limiting, logging, and admin UI.
- OpenAI API Compatibility
- Withering Tokens: Expiration, revocation, and rate-limiting
- Project-based Access Control with lifecycle management
- Soft Deactivation: Projects and tokens use activation flags instead of destructive deletes
- Individual Token Operations: GET, PATCH, DELETE with comprehensive audit trails
- Bulk Token Management: Revoke all tokens for a project
- Project Activation Controls: Deactivate projects to block token generation and API access
- Admin UI Actions: Edit/revoke tokens, activate/deactivate projects, bulk operations
- HTTP Response Caching: Redis-backed cache with configurable TTL, auth-aware shared caching, and streaming response support. Enable with
HTTP_CACHE_ENABLED=true. - Admin UI: Web interface for management
- Comprehensive Logging & Audit Events: Full lifecycle operation tracking for compliance
- Async Instrumentation Middleware: Non-blocking, streaming-capable instrumentation for all API calls. See docs/instrumentation.md for advanced usage and extension.
- Async Event Bus & Dispatcher: All API instrumentation events are handled via an always-on, fully asynchronous event bus (in-memory or Redis) with support for multiple subscribers, batching, retry logic, and graceful shutdown. Persistent event logging is handled by a dispatcher CLI or the
--file-event-logflag. - OpenAI Token Counting: Accurate prompt and completion token counting using tiktoken-go.
- Metrics Endpoint (provider-agnostic): Optional JSON metrics endpoint; Prometheus scraping/export is optional and not required by core features
- SQLite Storage
- Docker Deployment
docker pull ghcr.io/sofatutor/llm-proxy:latest
mkdir -p ./llm-proxy/data
docker run -d \
--name llm-proxy \
-p 8080:8080 \
-v ./llm-proxy/data:/app/data \
-e MANAGEMENT_TOKEN=your-secure-management-token \
ghcr.io/sofatutor/llm-proxy:latest# Start Redis
docker run -d --name redis -p 6379:6379 redis:alpine
# Start proxy with caching enabled
docker run -d \
--name llm-proxy \
-p 8080:8080 \
-v ./llm-proxy/data:/app/data \
-e MANAGEMENT_TOKEN=your-secure-management-token \
-e HTTP_CACHE_ENABLED=true \
-e HTTP_CACHE_BACKEND=redis \
-e REDIS_CACHE_URL=redis://redis:6379/0 \
--link redis \
ghcr.io/sofatutor/llm-proxy:latestgit clone https://github.com/sofatutor/llm-proxy.git
cd llm-proxy
make build
MANAGEMENT_TOKEN=your-secure-management-token ./bin/llm-proxyMANAGEMENT_TOKEN(required): Admin API accessLISTEN_ADDR: Default:8080DATABASE_PATH: Default./data/llm-proxy.dbLOG_LEVEL: DefaultinfoLOG_FILE: Path to log file (stdout if empty)LOG_MAX_SIZE_MB: Rotate log after this size in MB (default 10)LOG_MAX_BACKUPS: Number of rotated log files to keep (default 5)AUDIT_ENABLED: Enable audit logging (defaulttrue)AUDIT_LOG_FILE: Audit log file path (default./data/audit.log)AUDIT_STORE_IN_DB: Store audit events in database (defaulttrue)AUDIT_CREATE_DIR: Create audit log directories (defaulttrue)OBSERVABILITY_ENABLED: Deprecated; the async event bus is now always enabledOBSERVABILITY_BUFFER_SIZE: Event buffer size for instrumentation events (default 1000)FILE_EVENT_LOG: Path to persistent event log file (enables file event logging via dispatcher)
HTTP_CACHE_ENABLED: Enable HTTP response caching (defaulttrue)HTTP_CACHE_BACKEND: Cache backend (redisorin-memory, defaultin-memory)REDIS_CACHE_URL: Redis connection URL (defaultredis://localhost:6379/0when backend=redis)REDIS_CACHE_KEY_PREFIX: Cache key prefix (defaultllmproxy:cache:)HTTP_CACHE_MAX_OBJECT_BYTES: Maximum cached object size in bytes (default 1048576)HTTP_CACHE_DEFAULT_TTL: Default TTL in seconds when upstream doesn't specify (default 300)
See docs/api-configuration.md and docs/instrumentation.md for all options and advanced usage.
apis:
openai:
param_whitelist:
model:
- gpt-4o
- gpt-4.1-*
allowed_origins:
- https://www.sofatutor.com
- http://localhost:4000
required_headers:
- originSee docs/issues/phase-7-param-cors-whitelist.md for advanced configuration and rationale.
/manage/projects— Project lifecycle managementGET /manage/projects— List all projectsPOST /manage/projects— Create a new project (defaults to active)
/manage/projects/{projectId}GET— Get project detailsPATCH— Update a project (supportsis_activefield)DELETE— 405 Method Not Allowed (no destructive deletes)
/manage/projects/{projectId}/tokens/revoke— Bulk token operationsPOST— Revoke all tokens for project
/manage/tokens— Token lifecycle managementGET /manage/tokens— List all tokens (filter by project, active status)POST /manage/tokens— Generate a new token (blocked if project inactive)
/manage/tokens/{tokenId}GET— Get token detailsPATCH— Update token (activate/deactivate)DELETE— Revoke token (soft deactivation)
All management endpoints require:
Authorization: Bearer <MANAGEMENT_TOKEN>
# Create active project
curl -X POST http://localhost:8080/manage/projects \
-H "Authorization: Bearer $MANAGEMENT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "My Project", "openai_api_key": "sk-..."}'
# Update project activation status
curl -X PATCH http://localhost:8080/manage/projects/<project-id> \
-H "Authorization: Bearer $MANAGEMENT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"is_active": false}'
# Bulk revoke project tokens
curl -X POST http://localhost:8080/manage/projects/<project-id>/tokens/revoke \
-H "Authorization: Bearer $MANAGEMENT_TOKEN"
# Revoke individual token
curl -X DELETE http://localhost:8080/manage/tokens/<token-id> \
-H "Authorization: Bearer $MANAGEMENT_TOKEN"POST /v1/*— Forwarded to OpenAI, requires withering token
Example:
curl -H "Authorization: Bearer <withering-token>" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","messages":[{"role":"user","content":"Hello"}]}' \
http://localhost:8080/v1/chat/completionsNote: The proxy API is not documented with Swagger/OpenAPI except for authentication and allowed paths/methods. For backend schemas, refer to the provider's documentation.
/admin/— Web interface with lifecycle management- Project activation/deactivation controls
- Token revocation and editing
- Bulk token management by project
- Audit event viewing (when enabled)
The CLI provides full management of projects and tokens via the llm-proxy manage command with lifecycle operations. All subcommands support the --manage-api-base-url flag (default: http://localhost:8080) and require a management token (via --management-token or MANAGEMENT_TOKEN env).
# List projects with activation status
llm-proxy manage project list --manage-api-base-url http://localhost:8080 --management-token <token>
# Get project details
llm-proxy manage project get <project-id> --manage-api-base-url http://localhost:8080 --management-token <token>
# Create project (defaults to active)
llm-proxy manage project create --name "My Project" --openai-key sk-... --manage-api-base-url http://localhost:8080 --management-token <token>
# Update project (supports activation changes)
# Note: --is-active flag not yet available in CLI; use direct API calls for activation control
curl -X PATCH http://localhost:8080/manage/projects/<project-id> \
-H "Authorization: Bearer $MANAGEMENT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"is_active": false}'
# CLI currently supports name and API key updates
llm-proxy manage project update <project-id> --name "New Name" --manage-api-base-url http://localhost:8080 --management-token <token>
# Project deletion not supported (405) - use deactivation instead
# llm-proxy manage project delete <project-id> # This will fail with 405# Generate token (blocked if project inactive via API validation)
llm-proxy manage token generate --project-id <project-id> --duration 24 --manage-api-base-url http://localhost:8080 --management-token <token>
# Note: Token listing, details, and revocation not yet available in CLI
# Use direct API calls for these operations:
# List tokens with filtering
curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
"http://localhost:8080/manage/tokens?project_id=<project-id>&active_only=true"
# Get token details
curl -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
"http://localhost:8080/manage/tokens/<token-id>"
# Revoke individual token
curl -X DELETE -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
"http://localhost:8080/manage/tokens/<token-id>"
# Bulk revoke project tokens
curl -X POST -H "Authorization: Bearer $MANAGEMENT_TOKEN" \
"http://localhost:8080/manage/projects/<project-id>/tokens/revoke"--manage-api-base-url— Set the management API base URL (default: http://localhost:8080)--management-token— Provide the management token (or setMANAGEMENT_TOKENenv)--json— Output results as JSON (optional)
The LLM Proxy includes a powerful, pluggable dispatcher system for sending observability events to external services. The dispatcher supports multiple backends and can be run as a separate service.
- file: Write events to JSONL file
- lunary: Send events to Lunary.ai platform
- helicone: Send events to Helicone platform
# File output
llm-proxy dispatcher --service file --endpoint events.jsonl
# Lunary integration
export LLM_PROXY_API_KEY="your-lunary-api-key"
llm-proxy dispatcher --service lunary
# Helicone integration
llm-proxy dispatcher --service helicone --api-key your-helicone-key
# Custom batch size and buffer
llm-proxy dispatcher --service lunary --api-key $API_KEY --batch-size 50 --buffer 2000The dispatcher can be deployed in multiple ways:
- Standalone Process: Run as a separate service for production
- Sidecar Container: Deploy alongside the main proxy in Kubernetes
- Background Mode: Use
--detachflag for daemon-like operation
See docs/instrumentation.md for detailed configuration and architecture.
Warning: Event loss can occur if the Redis event log is configured with TTL/max length values that are too low for your dispatcher lag and throughput. In production, increase Redis TTL and list length to cover worst-case backlogs and keep the dispatcher running with sufficient batch size/throughput. For strict guarantees, use a durable queue (e.g., Redis Streams with consumer groups or Kafka). See the Production Reliability section in
docs/instrumentation.md.
Note: The in-memory event bus only works within a single process. For multi-process setups (e.g., running the proxy and dispatcher as separate processes or containers), you must use Redis as the event bus backend.
A redis service is included in the docker-compose.yml for local development:
db:
image: redis:7
container_name: llm-proxy-redis
ports:
- "6379:6379"
restart: unless-stoppedSet the event bus backend to Redis by using the appropriate environment variable or CLI flag (see documentation for exact flag):
LLM_PROXY_EVENT_BUS=redis llm-proxy ...
LLM_PROXY_EVENT_BUS=redis llm-proxy dispatcher ...This ensures both the proxy and dispatcher share events via Redis, enabling full async pipeline testing and production-like operation.
/cmd— Entrypoints (proxy,eventdispatcher)/internal— Core logic (token, database, proxy, admin, logging, eventbus, dispatcher)/api— OpenAPI specs/web— Admin UI static assets/docs— Full documentation
- Tokens support expiration, revocation, and rate limits
- Management API protected by
MANAGEMENT_TOKEN - Admin UI uses basic auth (
ADMIN_USER,ADMIN_PASSWORD) - Logs stored locally and/or sent to external backends
- Use HTTPS in production (via reverse proxy)
- See
docs/security.mdanddocs/production.mdfor best practices
- Multi-stage Dockerfile builds a static binary and ships a minimal Alpine runtime
- Runs as non-root user
appuserwith read-only filesystem by default - Healthcheck hits
/health; seedocker-compose.ymlor DockerfileHEALTHCHECK - Volumes:
/app/data,/app/logs,/app/config,/app/certs - Example local build/test:
make docker-build
make docker-run
make docker-smokeImages are built and published to GitHub Container Registry on pushes to main and tags v*.
Registry: ghcr.io/sofatutor/llm-proxy
Workflow: .github/workflows/docker.yml builds for linux/amd64 and linux/arm64 and pushes labels/tags.
This README provides a quick overview and getting started guide. For comprehensive documentation, see the /docs directory:
Key Documentation:
- CLI Reference - Complete command-line interface documentation
- Go Package Documentation - Using LLM Proxy packages in your applications
- Architecture Guide - System architecture and design
- API Configuration - Advanced API provider configuration
- Security Best Practices - Production security guidelines
- Instrumentation Guide - Event system and observability
For Developers:
- OpenAPI Specification - Machine-readable API definitions
- Contributing Guidelines - How to contribute to the project
MIT License