/llm_relay

A relay server that allows user proxies on local network-machines to talk with LLMs served on OpenRouter.

Primary LanguagePython

LLM Relay Server

This is a FastAPI-based relay server that forwards LLM requests from a private network to an external LLM endpoint that uses OpenAI-compatible API (such as OpenAI and OpenRouter). It's designed to work with agentic AI frameworks that need to communicate with the endpoint but are running on machines without direct internet access.

This relay server only works with OpenAI-compatible API.

Setup

  1. Create a virtual environment using uv:
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies using uv:
uv pip install -r requirements.txt
  1. Create a .env file with your API key with the LLM provider:
LLM_RELAY_API_KEY=your_api_key_here

Running the Server

Start the server with:

python main.py

or

uvicorn main:app --host 0.0.0.0 --port 8000

You can also set the --base-url flag to a specific base URL. By default, it is the default URL of OpenAI.

Add --debug flag to show HTTP payloads.

The server will be available at http://your_server_ip:8000.

Usage

The relay server exposes the following endpoints:

  • POST /v1/chat/completions: Forwards chat completion requests to endpoint
  • POST /v1/completions: Forwards completion requests to endpoint

You can use these endpoints just like you would use an OpenAI-compatible API directly. The server will handle the authentication and forwarding of requests.

Configuration

You can configure the following environment variables:

  • LLM_RELAY_API_KEY: Your API key
  • DEFAULT_MODEL: Default model to use if not specified in the request (optional)