xusenlinzy/api-for-open-llm
Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
PythonApache-2.0
Issues
- 1
4*4090 显卡部署glm4-9b 使用dify 的api调用报错
#315 opened by he498 - 3
TASKS=llm,rag模式下,出现线程问题报错:RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
#308 opened by syusama - 2
框架vllm输出截断,但是官方vllm启动和transformers运行模型都不
#314 opened by TLL1213 - 0
执行streamlit_app.py报错
#310 opened by louan1998 - 3
使用Qwen2-7B-Instrut模型出现问题-使用Vllm
#303 opened by Empress7211 - 1
glm4 接入dify后无法触发使用工具
#288 opened by he498 - 1
运行glm4v请求报错
#311 opened by 760485464 - 0
not support sglang backend
#309 opened by colinsongf - 3
docker运行报错:multiproc_worker_utils.py:226] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
#305 opened by syusama - 0
部署gte-qwen2-1.5b-instruct请求rerank接口报错
#307 opened by cowcomic - 10
glm-4v启动正常 访问推理报错
#291 opened by 760485464 - 9
- 0
python: can't open file '/workspace/api/server.py': [Errno 2] No such file or directory,Ubuntu上docker-compose部署Qwen2-72B-Instruct-GPTQ-Int4报错
#304 opened by syusama - 1
llama3-8B回答后自我交流,不停止
#296 opened by yd9038074 - 2
minicpm启动没问题,推理访问报错
#292 opened by 760485464 - 0
- 2
【embedding】是不支持最新的SOTA模型吗 ?KeyError: 'Could not automatically map text2vec-base-multilingual to a tokeniser.
#297 opened by ForgetThatNight - 0
- 1
doc chat 使用时报 FileNotFoundError: Table does not exist.Please first call db.create_table(, data) 错误
#299 opened by Weiqiang-Li - 5
- 10
qwen2推理报错
#293 opened by wj1017090777 - 5
使用api-for-open-llm&vllm多卡部署运行Qwen2-7B时报错显存占满
#290 opened by Woiea - 2
执行SQL chat时候报ProgrammingError错误
#277 opened by songyao199681 - 2
使用 streamer_v2 会造成乱码
#287 opened by Tendo33 - 1
"POST /v1/files HTTP/1.1" 404 Not Found
#286 opened by KEAI404 - 2
我想使用的模型不在模型支持列表,是否说明无法使用此项目生成openai的接口
#258 opened by xiaoma444 - 4
使用最新的 vllm 镜像推理qwen2-72B-AWQ 报错
#285 opened by Tendo33 - 1
docker无法下载image
#284 opened by xqinshan - 2
- 9
接口请求报错:TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
#282 opened by syusama - 12
dcoker 部署 vllm 出现 404 Not Found
#271 opened by skyliwq - 14
- 2
vllm模式推理报错
#279 opened by yeehua-cn - 0
无法运行instruction.py
#280 opened by NCCurry30 - 5
EMBEDDING_API_BASE获取不到str expected, not NoneType
#270 opened by chukangkang - 2
使用baichuan2-13b-chat模型,回答的乱码,代码写不出来
#261 opened by guiniao - 1
- 3
我现在部署了很多模型,有没有一个webui 界面让我来统一调用部署的模型进行推理
#278 opened by Tendo33 - 4
vllm本地部署时,vllm engine启动失败
#274 opened by Ruibn - 0
什么时候能修复 Qwen 1.5 call function功能了。
#273 opened by skyliwq - 2
找不到图标
#265 opened by lucheng07082221 - 0
💡 vllm已经支持流水线并行啦(pipeline parallel),可以极大增加吞吐量,作者可否增加一下vllm的pipeline parallel支持
#269 opened by CaptainLeezz - 0
lifespan not work, cache not cleared
#260 opened by Yimi81 - 1
vllm 容器依赖报错
#268 opened by Tendo33 - 2
llama3提问后回答不停止
#266 opened by gptcod - 3
运行internlm2时报错找不到权重文件,这些文件模型并不提供
#263 opened by 760485464 - 1
- 1
关于 api/config.py 中 SETTINGS = Settings()的 bug
#262 opened by Tendo33 - 3
MODEL_NAME=qwen2的情况下functions无效
#257 opened by liuyi1213812 - 1
ValueError: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (15248). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
#259 opened by guiniao