Qwen Code Proxy

Wrap Qwen Code as an OpenAI-compatible API service, allowing you to enjoy the free Qwen3 Coder model through API!

✅ 2,000 requests/day
✅ 60 requests/minute rate limit
✅ Zero cost for individual users

✨ Features

🔌 OpenAI API Compatible: Implements /v1/chat/completions endpoint
🚀 Quick Setup: Zero-config run with uvx
⚡ High Performance: Built on FastAPI + asyncio with concurrent request support

📦 Installation

Install uv

uv is an extremely fast Python package installer and resolver, written in Rust.
```
pip install uv
```
Install dependencies

Clone this repository and run:
```
uv pip install -e .
```

🚀 Quick Start

Install Qwen Code

Follow the installation guide from Qwen Code's official repository.

🔑 Authentication

The first time you run qwen, it will guide you through an authentication process using the OAuth 2.0 device flow. This is a one-time setup.

Browser-Based Login: The application will automatically open a new tab in your web browser, directing you to the Qwen login page.
Authorization: Log in to your Qwen account in the browser.

After successful authorization, the application will securely store the authentication tokens in ~/.qwen/oauth_creds.json. This allows the proxy to access your Qwen account without requiring you to log in again.

Start Qwen Code Proxy

Run the following command:

uv run qwen-code-proxy

Qwen Code Proxy listens on port 8765 by default. You can customize the startup port with the --port parameter.

After startup, test the service with curl:

curl http://localhost:8765/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy-key" \
  -d '{
    "model": "qwen3-coder-plus",
    "messages": [{"role": "user", "content": "Hello! Can you introduce your self?"}]
  }'

Usage Examples

OpenAI Client

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:8765/v1',
    api_key='dummy-key'  # Any string works
)

response = client.chat.completions.create(
    model='qwen3-coder-plus',
    messages=[
        {'role': 'user', 'content': 'Hello! Can you introduce your self?'}
    ],
)

print(response.choices[0].message.content)

Kilo Code Integration

Add Model Provider in Kilo Code settings:

API Provider: OpenAI Compatible
API Host: http://localhost:8765/v1
API Key: Any string works
Model Name: qwen3-coder-plus
Uncheck the "Enable Streaming"
Uncheck the "Image Support"
Set the "Rate limit" to "1s", because currently Qwen Code's rate limit is 60 request per minute.

⚙️ Configuration Options

View command line parameters:

qwen-code-proxy --help

Available options:

--host: Server host address (default: 127.0.0.1)
--port: Server port (default: 8765)
--rate-limit: Max requests per minute (default: 60)
--max-concurrency: Max concurrent subprocesses (default: 4)
--timeout: Qwen Code Proxy command timeout in seconds (default: 30.0)
--debug: Enable debug mode (enables debug logging and file watching)

📄 License

MIT License

🤝 Contributing

Issues and Pull Requests are welcome!

🏗️ Origin & Attribution

This project is a fork and adaptation of gemini-cli-proxy, originally created by William Liu.

The original tool provided an OpenAI-compatible API layer for Gemini CLI. This version has been modified to support Qwen Code instead.

efebilici/qwen-code-proxy