/minimal-fastapi-openai-proxy

A minimal FastAPI proxy for OpenAI’s Chat Completions and Embeddings APIs. Supports wrapper API keys that map to real OpenAI API keys, enabling multi-tenant usage, simple key management, and full compatibility with the official OpenAI Python SDK.

Primary LanguagePythonMIT LicenseMIT

minimal-fastapi-openai-proxy

A minimal FastAPI proxy for OpenAI APIs — focused on Chat Completions and Embeddings.

✨ Features:

  • Exposes OpenAI-compatible endpoints (/v1/chat/completions, /v1/embeddings)
  • Works seamlessly with the official OpenAI Python SDK
  • Supports streaming chat completions (SSE)
  • Simple wrapper → real API key mapping (multi-tenant friendly)
  • Lightweight & minimal — no database needed
  • Deployable via Supervisor with included config

🚀 Quick Start

1. Clone & install

git clone https://github.com/yourname/minimal-fastapi-openai-proxy.git
cd minimal-fastapi-openai-proxy
pip install -r requirements.txt

2. Configure keys

Copy the example config and add your keys:

cp configs/config.py.example configs/config.py

Edit configs/config.py and set your wrapper → real OpenAI key mappings:

API_KEY_MAP = {
    "wrapper-key-alice": "sk-alice-real-openai-key",
    "wrapper-key-bob":   "sk-bob-real-openai-key",
}

Each wrapper key is used by clients. Each mapped key is the real OpenAI API key.

3. Run the server (local dev)

uvicorn server:app --reload

Server runs at: http://localhost:8000


🔑 Usage with OpenAI SDK

Point the SDK to your proxy and use a wrapper key:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="wrapper-key-alice",  # your wrapper key
)

# Chat
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello from FastAPI proxy!"}],
)
print(resp.choices[0].message.content)

# Streaming Chat
with client.chat.completions.stream(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a haiku about FastAPI"}],
) as stream:
    for event in stream:
        if getattr(event, "type", None) == "chunk":
            delta = event.chunk.choices[0].delta
            if delta and delta.content:
                print(delta.content, end="", flush=True)
    print()

# Embeddings
emb = client.embeddings.create(
    model="text-embedding-3-small",
    input=["fast and minimal proxy"],
)
print(len(emb.data[0].embedding), "dims")

🧩 API Endpoints

  • POST /v1/chat/completions Compatible with OpenAI Chat Completions API (supports stream: true for SSE).

  • POST /v1/embeddings Compatible with OpenAI Embeddings API.

  • GET /health Health check endpoint.


📦 Deployment with Supervisor

For production, use Supervisor to manage the process.

1. Supervisor config

Save as configs/minimal-fastapi-openai-proxy.conf:

[program:minimal-fastapi-openai-proxy]
command=/home/ubuntu/minimal-fastapi-openai-proxy/service.sh
directory=/home/ubuntu/minimal-fastapi-openai-proxy
autostart=true
autorestart=true
environment=PATH=/home/ubuntu/miniconda3/envs/minimal-fastapi-openai-proxy/bin:/usr/bin
redirect_stderr=true
stdout_logfile=/home/ubuntu/minimal-fastapi-openai-proxy/server.log
stopasgroup=true

Place it in /etc/supervisor/conf.d/ and reload:

sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl start minimal-fastapi-openai-proxy

2. service.sh

Included script to run Uvicorn:

#!/bin/bash
PORT=8000
PID=$(lsof -t -i:$PORT)
if [ -n "$PID" ]; then
    kill -9 $PID
    echo "Killed process using port $PORT"
fi
uvicorn server:app --host 0.0.0.0 --port $PORT --workers 1

Adjust --workers to fit your server resources.


🙌 Why?

OpenAI’s API is powerful, but sometimes you need:

  • Multi-tenant key management
  • A lightweight proxy layer
  • Compatibility with the official SDK
  • Minimal setup with no DB

That’s exactly what minimal-fastapi-openai-proxy provides.


📜 License

MIT License – free to use, modify, and share. See LICENSE for details.