xusenlinzy/api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口

PythonApache-2.0

Issues

💡 [REQUEST] - Cohere embed 支持
#178 opened 3 months ago
0
可否考虑添加llama.cpp推理引擎
#177 opened 7 months ago
3
💡 [REQUEST] - 想知道template中的这些模型全都支持吗？
#176 opened 7 months ago
1
💡 [REQUEST] - vllm支持更多模型
#175 opened 7 months ago
1
💡 [REQUEST] - <请求增加OrionStar-Yi-34B-Chat，基于Yi-34B-Base的对话模型>
#174 opened 7 months ago
1
TypeError: '>=' not supported between instances of 'RuntimeError' and 'int'
#173 opened 3 months ago
7
在使用百川2的时候经常出现响应内容携带human字样，用户体验不好，可以解决吗
#171 opened 8 months ago
4
请问支持启动时加载多个LLM模型吗？
#170 opened 7 months ago
1
更新到了最新版本，流式输出还是一直停不下来
#169 opened 7 months ago
3
最新的代码仍不支持qwen-14b模型下"role":"assistant"的对话
#168 opened 8 months ago
1
请问有没有稳定版，目前我只需要chatglm2-6b和m3e-base，我拉取的master有些bug
#167 opened 8 months ago
4
求助非chat的baichuan1 7b怎么运行
#166 opened 8 months ago
5
在dify里使用api-for-open-llm暴露的embeddings接口，部分文档清洗报错
#165 opened 8 months ago
1
💡 [REQUEST] - <title>请支持chat2DB
#163 opened a month ago
0
使用发现模型不能进行并发推理，请问是我没有配置正确还是功能限制？如何才能进行并发推理
#162 opened 8 months ago
2
function call 调用功能来使用工具，当stream为True时，无法调用工具，False时可以调用
#161 opened 8 months ago
2
💡 [REQUEST] - 是否有计划增加对autogen的支持？
#160 opened a month ago
1
💡 [REQUEST] - <title>请支持Qwen-Agent接口，感谢！！！
#159 opened a month ago
0
💡 [REQUEST] - <title>如何解决跨域问题？
#158 opened 8 months ago
2
在接入openai应用时 role参数为system报错，希望能兼容baichuan的assistant
#157 opened 8 months ago
1
💡 [REQUEST] - 如何多卡运行？
#156 opened 8 months ago
2
通义千问Qwen-14B-Chat封装接口 kv cache 一直卡在94.1%，长时间无响应，然后其他服务进不来
#155 opened 8 months ago
2
💡 [REQUEST] - 支持 XuanYuan-70B-Chat-4bit / 8bit
#154 opened 7 months ago
0
用Qwen14B 的流式输出输出停不下来
#153 opened 8 months ago
3
💡 [REQUEST] - 跟 ChatGLM3 配套的视觉语言模型 CogVLM 支持
#152 opened a month ago
0
💡 [REQUEST] - 支持chatglm3-6B
#151 opened 8 months ago
3
默认GPU0运行的，怎么配置能到GPU1运行？
#150 opened 8 months ago
1
Qwen-14B-Chat-Int4 响应速度比未量化的版本慢很多
#149 opened 6 months ago
3
QWEN使用vllm启动时，没有对|endoftext|进行截断
#147 opened 8 months ago
2
部署的本地模型，按照项目文档配置，AgentType.OPENAI_FUNCTIONS时为什么调用远程openai接口？
#146 opened 8 months ago
0
💡 [REQUEST] - <title> 请问Qwen-14B-Chat-Int4模型可以跑在vllm的镜像上面吗？
#145 opened 8 months ago
1
[Question] ChatGLM2使用vllm推理加速提示AttributeError: 'ChatGLMConfig' object has no attribute 'num_hidden_layers'
#144 opened 9 months ago
3
conda 虚拟环境下执行pip install -r requirements.txt
#143 opened 9 months ago
2
💡 [REQUEST] - <title> wizardLM系列求更新！
#142 opened 7 months ago
3
模型支持: 建议增加微软的 microsoft/phi-1_5
#141 opened a month ago
0
用vllm的docker镜像打包报错
#140 opened 9 months ago
1
请问是否支持https://github.com/PanQiWei/AutoGPTQ 量化后的qwen-7b模型
#139 opened 9 months ago
5
💡 [REQUEST] - <title>请问，可以只跑embedding模型，而不启动LLM吗？
#138 opened 9 months ago
2
transformer版本为4.34.0启动时，报'ChatGLMTokenizer' object has no attribute 'tokenizer'错误
#137 opened 9 months ago
2
为什么用lora微调完成后稍微和原始问题不一样就回答错误呢
#136 opened 7 months ago
0
Qwen-14B-Chat-Int4 加载报错
#135 opened 9 months ago
1
💡 [REQUEST] - 需要支持Qwen-14B-chat
#134 opened 9 months ago
3
你好，支持书生20B吗
#133 opened 9 months ago
1
这个项目非常有价值，感恩随喜！💡 [REQUEST] - <title>
#132 opened 9 months ago
0
GPU KV cache usage: 100.0% 之后卡死？
#131 opened 8 months ago
3
baichuan2-13b双卡部署，显存占用按照短板平均分配，现状：被使用的显存不足，服务无法正常使用
#130 opened 6 months ago
0
💡 [REQUEST] - < InternLM-20B的启动方式和InternLM-7B是一样的吗>
#129 opened 9 months ago
1
请求的时候传了max_token==4000但是每次聊天返回只有500左右
#128 opened 9 months ago
3
采用Vllm方式启动baichuan2-7b模型，回复乱码
#127 opened 9 months ago
2
💡 [REQUES"text-embedding-ada-002T] - <title>
#126 opened 6 months ago
1