Wrap Qwen Code as an OpenAI-compatible API service, allowing you to enjoy the free Qwen3 Coder model through API!
- ✅ 2,000 requests/day
- ✅ 60 requests/minute rate limit
- ✅ Zero cost for individual users
- 🔌 OpenAI API Compatible: Implements
/v1/chat/completions
endpoint - 🚀 Quick Setup: Zero-config run with
uvx
- ⚡ High Performance: Built on FastAPI + asyncio with concurrent request support
-
Install uv
uv
is an extremely fast Python package installer and resolver, written in Rust.pip install uv
-
Install dependencies
Clone this repository and run:
uv pip install -e .
Follow the installation guide from Qwen Code's official repository.
The first time you run qwen
, it will guide you through an authentication process using the OAuth 2.0 device flow. This is a one-time setup.
- Browser-Based Login: The application will automatically open a new tab in your web browser, directing you to the Qwen login page.
- Authorization: Log in to your Qwen account in the browser.
After successful authorization, the application will securely store the authentication tokens in ~/.qwen/oauth_creds.json
. This allows the proxy to access your Qwen account without requiring you to log in again.
Run the following command:
uv run qwen-code-proxy
Qwen Code Proxy listens on port 8765
by default. You can customize the startup port with the --port
parameter.
After startup, test the service with curl:
curl http://localhost:8765/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer dummy-key" \
-d '{
"model": "qwen3-coder-plus",
"messages": [{"role": "user", "content": "Hello! Can you introduce your self?"}]
}'
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:8765/v1',
api_key='dummy-key' # Any string works
)
response = client.chat.completions.create(
model='qwen3-coder-plus',
messages=[
{'role': 'user', 'content': 'Hello! Can you introduce your self?'}
],
)
print(response.choices[0].message.content)
Add Model Provider in Kilo Code settings:
- API Provider: OpenAI Compatible
- API Host:
http://localhost:8765/v1
- API Key: Any string works
- Model Name:
qwen3-coder-plus
- Uncheck the "Enable Streaming"
- Uncheck the "Image Support"
- Set the "Rate limit" to "1s", because currently Qwen Code's rate limit is 60 request per minute.
View command line parameters:
qwen-code-proxy --help
Available options:
--host
: Server host address (default: 127.0.0.1)--port
: Server port (default: 8765)--rate-limit
: Max requests per minute (default: 60)--max-concurrency
: Max concurrent subprocesses (default: 4)--timeout
: Qwen Code Proxy command timeout in seconds (default: 30.0)--debug
: Enable debug mode (enables debug logging and file watching)
MIT License
Issues and Pull Requests are welcome!
This project is a fork and adaptation of gemini-cli-proxy, originally created by William Liu.
The original tool provided an OpenAI-compatible API layer for Gemini CLI. This version has been modified to support Qwen Code instead.