/qwen-code-proxy

Wrap Qwen Code as an OpenAI-compatible API service, allowing you to enjoy the free Qwen3 Coder model through API!

Primary LanguagePythonMIT LicenseMIT

Qwen Code Proxy

Python 3.12+ License: MIT

Wrap Qwen Code as an OpenAI-compatible API service, allowing you to enjoy the free Qwen3 Coder model through API!

  • 2,000 requests/day
  • 60 requests/minute rate limit
  • Zero cost for individual users

✨ Features

  • 🔌 OpenAI API Compatible: Implements /v1/chat/completions endpoint
  • 🚀 Quick Setup: Zero-config run with uvx
  • High Performance: Built on FastAPI + asyncio with concurrent request support

📦 Installation

  1. Install uv

    uv is an extremely fast Python package installer and resolver, written in Rust.

    pip install uv
  2. Install dependencies

    Clone this repository and run:

    uv pip install -e .

🚀 Quick Start

Install Qwen Code

Follow the installation guide from Qwen Code's official repository.

🔑 Authentication

The first time you run qwen, it will guide you through an authentication process using the OAuth 2.0 device flow. This is a one-time setup.

  1. Browser-Based Login: The application will automatically open a new tab in your web browser, directing you to the Qwen login page.
  2. Authorization: Log in to your Qwen account in the browser.

After successful authorization, the application will securely store the authentication tokens in ~/.qwen/oauth_creds.json. This allows the proxy to access your Qwen account without requiring you to log in again.

Start Qwen Code Proxy

Run the following command:

uv run qwen-code-proxy

Qwen Code Proxy listens on port 8765 by default. You can customize the startup port with the --port parameter.

After startup, test the service with curl:

curl http://localhost:8765/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer dummy-key" \
  -d '{
    "model": "qwen3-coder-plus",
    "messages": [{"role": "user", "content": "Hello! Can you introduce your self?"}]
  }'

Usage Examples

OpenAI Client

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:8765/v1',
    api_key='dummy-key'  # Any string works
)

response = client.chat.completions.create(
    model='qwen3-coder-plus',
    messages=[
        {'role': 'user', 'content': 'Hello! Can you introduce your self?'}
    ],
)

print(response.choices[0].message.content)

Kilo Code Integration

Add Model Provider in Kilo Code settings:

  • API Provider: OpenAI Compatible
  • API Host: http://localhost:8765/v1
  • API Key: Any string works
  • Model Name: qwen3-coder-plus
  • Uncheck the "Enable Streaming"
  • Uncheck the "Image Support"
  • Set the "Rate limit" to "1s", because currently Qwen Code's rate limit is 60 request per minute.

⚙️ Configuration Options

View command line parameters:

qwen-code-proxy --help

Available options:

  • --host: Server host address (default: 127.0.0.1)
  • --port: Server port (default: 8765)
  • --rate-limit: Max requests per minute (default: 60)
  • --max-concurrency: Max concurrent subprocesses (default: 4)
  • --timeout: Qwen Code Proxy command timeout in seconds (default: 30.0)
  • --debug: Enable debug mode (enables debug logging and file watching)

📄 License

MIT License

🤝 Contributing

Issues and Pull Requests are welcome!

🏗️ Origin & Attribution

This project is a fork and adaptation of gemini-cli-proxy, originally created by William Liu.

The original tool provided an OpenAI-compatible API layer for Gemini CLI. This version has been modified to support Qwen Code instead.