/java-llm-proxy

该服务是一个支持多模型的大语言模型代理服务。该代理服务将接收来自客户端的请求(包括流式和非流式模式),并通过统一接口将请求转发至OpenAI、Anthropic Claude和Google Gemini等API,同时负责将返回结果以HTTP响应或SSE(Server-Sent Events)流的形式返回给客户端。

Primary LanguageJavaMIT LicenseMIT

使用tio-boot搭建多模型LLM代理服务

本文档介绍如何基于tio-boot框架快速搭建一个支持多模型的大语言模型代理服务。该代理服务将接收来自客户端的请求(包括流式和非流式模式),并通过统一接口将请求转发至OpenAI、Anthropic Claude和Google Gemini等API,同时负责将返回结果以HTTP响应或SSE(Server-Sent Events)流的形式返回给客户端。


1. 概述

  • 使用场景:当需要统一接入多个LLM服务时,可通过本代理服务解决以下问题:

    1. 网络策略限制(如无法直连API域名)
    2. 统一认证和密钥管理
    3. 多模型路由转发
    4. 流式/非流式响应格式转换
  • 核心功能

    1. 前端请求发送至本地tio-boot服务(如http://127.0.0.1/***
    2. 根据请求路径自动路由到对应API服务(OpenAI/Anthropic/Google)
    3. 根据请求体中的stream字段自动选择SSE流式或同步HTTP响应
    4. 将API返回结果(JSON或SSE流)适配返回给客户端

2. 整合到TioBoot工程

build

set JAVA_HOME=D:\java\jdk-1.8_411
mvn clean install -DskipTests -Dgpg.skip -q

add maven dependency

    <dependency>
      <groupId>com.litongjava</groupId>
      <artifactId>java-llm-proxy</artifactId>
      <version>1.0.0</version>
    </dependency>

config

import com.litongjava.llm.proxy.config.LLMProxyAppConfig;

new LLMProxyAppConfig().config();

3. 环境准备

  1. JDK:Java 8或更高版本

  2. 构建工具:Maven或Gradle

  3. 核心依赖(pom.xml示例):

<dependency>
  <groupId>com.litongjava</groupId>
  <artifactId>tio-boot-admin</artifactId>
  <version>1.0.5</version>
</dependency>
<dependency>
  <groupId>com.alibaba.fastjson2</groupId>
  <artifactId>fastjson2</artifactId>
  <version>2.0.30</version>
</dependency>
<dependency>
  <groupId>org.projectlombok</groupId>
  <artifactId>lombok</artifactId>
  <version>1.18.26</version>
  <scope>provided</scope>
</dependency>

4. 项目结构

llm-proxy-app/
├─ src/
│  ├─ main/
│  │  ├─ java/
│  │  │  ├─ com/litongjava/llm/proxy/
│  │  │  │  ├─ LLMProxyApp.java          # 应用入口
│  │  │  ├─ com/litongjava/llm/proxy/config/
│  │  │  │  └─ LLMProxyConfig.java       # 路由配置
│  │  │  ├─ com/litongjava/llm/proxy/handler/
│  │  │  │  └─ LLMProxyHandler.java  # 核心处理器
│  │  │  ├─ com/litongjava/llm/proxy/callback/
│  │  │  │  └─ SSEProxyCallback...java   # SSE回调处理
│  │  └─ resources/
│  │     └─ app.properties               # 配置文件
└─ pom.xml

5. 关键实现

5.1 LLMProxyHandler.java - 请求路由处理器

package com.litongjava.llm.proxy.handler;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.fastjson2.JSONObject;
import com.litongjava.claude.ClaudeClient;
import com.litongjava.gemini.GeminiClient;
import com.litongjava.llm.proxy.callback.SSEProxyCallbackEventSourceListener;
import com.litongjava.model.body.RespBodyVo;
import com.litongjava.openai.client.OpenAiClient;
import com.litongjava.proxy.AiChatProxyClient;
import com.litongjava.tio.boot.http.TioRequestContext;
import com.litongjava.tio.core.ChannelContext;
import com.litongjava.tio.http.common.HttpRequest;
import com.litongjava.tio.http.common.HttpResponse;
import com.litongjava.tio.http.common.utils.HttpIpUtils;
import com.litongjava.tio.http.server.util.CORSUtils;
import com.litongjava.tio.utils.environment.EnvUtils;
import com.litongjava.tio.utils.hutool.StrUtil;
import com.litongjava.tio.utils.json.FastJson2Utils;

import lombok.extern.slf4j.Slf4j;
import okhttp3.Response;
import okhttp3.sse.EventSourceListener;

@Slf4j
public class LLMProxyHandler  {

  public HttpResponse completions(HttpRequest httpRequest) {
    long start = System.currentTimeMillis();
    HttpResponse httpResponse = TioRequestContext.getResponse();
    CORSUtils.enableCORS(httpResponse);

    String requestURI = httpRequest.getRequestURI();

    String bodyString = httpRequest.getBodyString();

    if (StrUtil.isBlank(bodyString)) {
      return httpResponse.setJson(RespBodyVo.fail("empty body"));
    }

    String realIp = HttpIpUtils.getRealIp(httpRequest);
    log.info("from:{},requestURI:{}", realIp, requestURI);
    Boolean stream = false;
    String url = null;
    Map<String, String> headers = new HashMap<>();
    if (requestURI.startsWith("/openai")) {
      url = OpenAiClient.OPENAI_API_URL + "/chat/completions";
      headers.put("authorization", httpRequest.getAuthorization());

      JSONObject openAiRequestVo = null;

      if (bodyString != null) {
        openAiRequestVo = FastJson2Utils.parseObject(bodyString);
        stream = openAiRequestVo.getBoolean("stream");
      }
      
    } else if (requestURI.startsWith("/anthropic")) {
      url = ClaudeClient.CLAUDE_API_URL + "/messages";
      headers.put("x-api-key", httpRequest.getHeader("x-api-key"));
      headers.put("anthropic-version", httpRequest.getHeader("anthropic-version"));

      JSONObject openAiRequestVo = null;

      if (bodyString != null) {
        openAiRequestVo = FastJson2Utils.parseObject(bodyString);
        stream = openAiRequestVo.getBoolean("stream");
      }
      
    } else if (requestURI.startsWith("/google")) {
      String key = httpRequest.getParam("key");
      String modelName1 = requestURI.substring(requestURI.lastIndexOf('/') + 1, requestURI.indexOf(':'));
      if (requestURI.endsWith("streamGenerateContent")) {
        url = GeminiClient.GEMINI_API_URL + modelName1 + ":streamGenerateContent?alt=sse&key=" + key;
        stream = true;
      } else {
        url = GeminiClient.GEMINI_API_URL + modelName1 + ":generateContent?key=" + key;
      }
    }

    //String authorization = httpRequest.getHeader("authorization");

    if (stream != null && stream) {
      // 告诉默认的处理器不要将消息体发送给客户端,因为后面会手动发送
      httpResponse.setSend(false);
      ChannelContext channelContext = httpRequest.getChannelContext();
      EventSourceListener openAIProxyCallback = new SSEProxyCallbackEventSourceListener(channelContext, httpResponse, start);
      AiChatProxyClient.stream(url, headers, bodyString, openAIProxyCallback);
    } else {
      try (Response response = AiChatProxyClient.generate(url, headers, bodyString)) {
        //OkHttpResponseUtils.toTioHttpResponse(response, httpResponse);
        try {
          String string = response.body().string();
          httpResponse.setString(string, "utf-8", "application/json");
          if (EnvUtils.getBoolean("app.debug", false)) {
            log.info("chat:{},{}", bodyString, string);
          }

        } catch (IOException e) {
          e.printStackTrace();
        }
        long end = System.currentTimeMillis();
        log.info("finish llm in {} (ms):", (end - start));
      }
    }

    return httpResponse;
  }
}

功能说明

  1. 多模型路由:根据URL前缀/openai/anthropic/google自动路由到对应服务
  2. 头部处理:提取并转换各平台特有的认证头(Authorization/x-api-key)
  3. 流式检测:解析请求体中的stream字段决定响应模式
  4. 响应转换:非流式模式直接返回JSON,流式模式通过SSE回调处理

5.2 SSEProxyCallbackEventSourceListener.java - SSE回调处理器

package com.litongjava.llm.proxy.callback;

import java.io.IOException;

import com.jfinal.kit.StrKit;
import com.litongjava.tio.core.ChannelContext;
import com.litongjava.tio.core.Tio;
import com.litongjava.tio.http.common.HttpResponse;
import com.litongjava.tio.http.common.sse.SsePacket;
import com.litongjava.tio.utils.SystemTimer;

import lombok.extern.slf4j.Slf4j;
import okhttp3.Response;
import okhttp3.sse.EventSource;
import okhttp3.sse.EventSourceListener;

@Slf4j
public class SSEProxyCallbackEventSourceListener extends EventSourceListener {

  private ChannelContext channelContext;
  private HttpResponse httpResponse;
  private long start;
  private boolean continueSend = true;

  public SSEProxyCallbackEventSourceListener(ChannelContext channelContext, HttpResponse httpResponse, long start) {
    this.channelContext = channelContext;
    this.httpResponse = httpResponse;
    this.start = start;
  }

  @Override
  public void onOpen(EventSource eventSource, Response response) {
    httpResponse.addServerSentEventsHeader();
    httpResponse.setSend(true);
    Tio.send(channelContext, httpResponse);
  }

  @Override
  public void onEvent(EventSource eventSource, String id, String type, String data) {
    if (StrKit.notBlank(data)) {
      sendPacket(new SsePacket(type, data.getBytes()));
      // [DONE] 是open ai的数据标识 
      if ("[DONE]".equals(data)) {
        finish(eventSource);
        return;
      }
    }

  }

  @Override
  public void onClosed(EventSource eventSource) {
    finish(eventSource);
  }

  @Override
  public void onFailure(EventSource eventSource, Throwable t, Response response) {
    log.error(t.getMessage(), t);
    try {
      int code = response.code();
      String string = response.body().string();
      httpResponse.status(code);
      httpResponse.body(string);

      httpResponse.setSend(true);
      Tio.send(channelContext, httpResponse);
    } catch (IOException e) {
      e.printStackTrace();
    } finally {
      response.close();
    }

    finish(eventSource);
  }

  private void finish(EventSource eventSource) {
    log.info("elapse:{}", SystemTimer.currTime - start);
    eventSource.cancel();
    Tio.close(channelContext, "finish");
  }

  /** 三次重试发送 SSE,遇断就放弃 */
  private void sendPacket(SsePacket packet) {
    if (!continueSend)
      return;
    if (!Tio.bSend(channelContext, packet)) {
      if (!Tio.bSend(channelContext, packet)) {
        if (!Tio.bSend(channelContext, packet)) {
          continueSend = false;
        }
      }
    }
  }
}

核心机制

  1. SSE初始化onOpen设置Content-Type: text/event-stream并建立连接
  2. 数据流式转发onEvent将收到的数据块实时转发给客户端
  3. 终止信号处理:识别[DONE]标记并关闭连接
  4. 错误处理:API错误时返回原始错误信息和状态码

5.3 LLMProxyConfig.java - 路由配置

package com.litongjava.llm.proxy.config;

import com.litongjava.context.BootConfiguration;
import com.litongjava.llm.proxy.handler.LLMProxyHandler;
import com.litongjava.tio.boot.server.TioBootServer;
import com.litongjava.tio.http.server.router.HttpRequestRouter;

public class LLMProxyConfig implements BootConfiguration {

  public void config() {

    TioBootServer server = TioBootServer.me();
    HttpRequestRouter requestRouter = server.getRequestRouter();

    LLMProxyHandler LLMProxyHandler = new LLMProxyHandler();
    requestRouter.add("/openai/v1/chat/completions", LLMProxyHandler::completions);
    requestRouter.add("/anthropic/v1/messages", LLMProxyHandler::completions);
    requestRouter.add("/google/v1beta/models/*", LLMProxyHandler::completions);
  }
}

配置说明

  1. 统一入口:不同API路径使用相同的处理方法
  2. 通配符支持:Google模型路径支持*通配符
  3. 自动装配:通过BootConfiguration接口实现启动时自动注册

5.4 LLMProxyApp.java - 应用入口

package com.litongjava.llm.proxy;

import com.litongjava.llm.proxy.config.LLMProxyConfig;
import com.litongjava.tio.boot.TioApplication;

public class LLMProxyApp {
  public static void main(String[] args) {
    long start = System.currentTimeMillis();
    TioApplication.run(LLMProxyApp.class, new LLMProxyConfig(), args);
    long end = System.currentTimeMillis();
    System.out.println((end - start) + "ms");
  }
}

6. 服务启动与测试

6.1 启动服务

mvn clean package -DskipTests
java -jar target/llm-proxy-app-1.0.0.jar

6.2 多模型测试示例

OpenAI非流式测试

curl -X POST http://localhost/openai/v1/chat/completions \
  -H "Authorization: Bearer sk-proj-o" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role":"system","content":"Just say hi"}],
    "model": "gpt-3.5-turbo",
    "stream": false
  }'

Google Gemini流式测试

curl -X POST 'http://localhost/google/v1beta/models/gemini-2.5-flash:streamGenerateContent?key=API_KEY' \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"role": "user", "parts": [{"text": "hi"}]}]
  }'

Anthropic Claude流式测试

curl -X POST http://localhost/anthropic/v1/messages \
  -H "x-api-key: YOUR_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "messages": [{"role": "user", "content": "Just say hi"}],
    "stream": true,
    "model":"claude-3-7-sonnet-20250219"
  }'

7. 注意事项

  1. 响应体单次读取

    • 在非流式模式中,response.body().string()会自动读取并关闭流
    • 流式模式中避免重复读取响应体
  2. 连接管理

    • OkHttpClient应全局复用以保证连接池效率
    • SSE连接结束时需显式关闭通道
  3. 错误处理

    • API错误时透传原始错误码和消息体
    • 网络异常时记录日志并关闭连接
  4. 性能优化

    • 开启DEBUG日志:app.properties中设置app.debug=true
    • 监控请求处理时间:关键节点记录时间戳

8. 技术总结

本代理服务基于tio-boot框架实现以下核心功能:

  1. 多模型统一接入

    • OpenAI:标准ChatCompletions接口
    • Anthropic Claude:Messages API
    • Google Gemini:generateContent/streamGenerateContent
  2. 双模式响应支持

    • 同步模式:直接返回完整JSON响应
    • 流式模式:通过SSE实时传输数据块
  3. 高效路由机制

    • 路径前缀匹配不同API服务
    • 通配符处理模型动态路径
    • 统一请求处理方法

通过本方案,可快速构建支持多主流语言模型的统一代理服务,有效解决API访问限制问题,并提供一致的开发体验。