A minimal FastAPI proxy for OpenAI APIs — focused on Chat Completions and Embeddings.
✨ Features:
- Exposes OpenAI-compatible endpoints (
/v1/chat/completions
,/v1/embeddings
) - Works seamlessly with the official OpenAI Python SDK
- Supports streaming chat completions (SSE)
- Simple wrapper → real API key mapping (multi-tenant friendly)
- Lightweight & minimal — no database needed
- Deployable via Supervisor with included config
git clone https://github.com/yourname/minimal-fastapi-openai-proxy.git
cd minimal-fastapi-openai-proxy
pip install -r requirements.txt
Copy the example config and add your keys:
cp configs/config.py.example configs/config.py
Edit configs/config.py
and set your wrapper → real OpenAI key mappings:
API_KEY_MAP = {
"wrapper-key-alice": "sk-alice-real-openai-key",
"wrapper-key-bob": "sk-bob-real-openai-key",
}
Each wrapper key is used by clients. Each mapped key is the real OpenAI API key.
uvicorn server:app --reload
Server runs at: http://localhost:8000
Point the SDK to your proxy and use a wrapper key:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="wrapper-key-alice", # your wrapper key
)
# Chat
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello from FastAPI proxy!"}],
)
print(resp.choices[0].message.content)
# Streaming Chat
with client.chat.completions.stream(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a haiku about FastAPI"}],
) as stream:
for event in stream:
if getattr(event, "type", None) == "chunk":
delta = event.chunk.choices[0].delta
if delta and delta.content:
print(delta.content, end="", flush=True)
print()
# Embeddings
emb = client.embeddings.create(
model="text-embedding-3-small",
input=["fast and minimal proxy"],
)
print(len(emb.data[0].embedding), "dims")
-
POST /v1/chat/completions
Compatible with OpenAI Chat Completions API (supportsstream: true
for SSE). -
POST /v1/embeddings
Compatible with OpenAI Embeddings API. -
GET /health
Health check endpoint.
For production, use Supervisor to manage the process.
Save as configs/minimal-fastapi-openai-proxy.conf
:
[program:minimal-fastapi-openai-proxy]
command=/home/ubuntu/minimal-fastapi-openai-proxy/service.sh
directory=/home/ubuntu/minimal-fastapi-openai-proxy
autostart=true
autorestart=true
environment=PATH=/home/ubuntu/miniconda3/envs/minimal-fastapi-openai-proxy/bin:/usr/bin
redirect_stderr=true
stdout_logfile=/home/ubuntu/minimal-fastapi-openai-proxy/server.log
stopasgroup=true
Place it in /etc/supervisor/conf.d/
and reload:
sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl start minimal-fastapi-openai-proxy
Included script to run Uvicorn:
#!/bin/bash
PORT=8000
PID=$(lsof -t -i:$PORT)
if [ -n "$PID" ]; then
kill -9 $PID
echo "Killed process using port $PORT"
fi
uvicorn server:app --host 0.0.0.0 --port $PORT --workers 1
Adjust --workers
to fit your server resources.
OpenAI’s API is powerful, but sometimes you need:
- Multi-tenant key management
- A lightweight proxy layer
- Compatibility with the official SDK
- Minimal setup with no DB
That’s exactly what minimal-fastapi-openai-proxy provides.
MIT License – free to use, modify, and share. See LICENSE for details.