This is a FastAPI-based relay server that forwards LLM requests from a private network to an external LLM endpoint that uses OpenAI-compatible API (such as OpenAI and OpenRouter). It's designed to work with agentic AI frameworks that need to communicate with the endpoint but are running on machines without direct internet access.
This relay server only works with OpenAI-compatible API.
- Create a virtual environment using uv:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies using uv:
uv pip install -r requirements.txt- Create a
.envfile with your API key with the LLM provider:
LLM_RELAY_API_KEY=your_api_key_here
Start the server with:
python main.pyor
uvicorn main:app --host 0.0.0.0 --port 8000You can also set the --base-url flag to a specific base URL.
By default, it is the default URL of OpenAI.
Add --debug flag to show HTTP payloads.
The server will be available at http://your_server_ip:8000.
The relay server exposes the following endpoints:
POST /v1/chat/completions: Forwards chat completion requests to endpointPOST /v1/completions: Forwards completion requests to endpoint
You can use these endpoints just like you would use an OpenAI-compatible API directly. The server will handle the authentication and forwarding of requests.
You can configure the following environment variables:
LLM_RELAY_API_KEY: Your API keyDEFAULT_MODEL: Default model to use if not specified in the request (optional)