可否考虑添加llama.cpp推理引擎

Question

可否考虑添加llama.cpp推理引擎

Lookforworld opened this issue 10 months ago · 3 comments

Lookforworld commented 10 months ago

提交前必须检查以下项目 | The following items must be checked before submission

请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

@xusenlinzy
可否添加llama.cpp推理，他们本身有sever服务，也提供了一套类open的sever脚本，但是脚本内容有待完善，我这两天试图用将llama.cpp server跟您的server整合，但是难度有点高，各种嵌套调用这两天给我整麻了，各种调试不通过，所以希望作者出马整合llama.cpp引擎。（PS,vLLM门槛太高，硬件要求不达标安装不了😭）

Dependencies

# 请在此处粘贴依赖情况
# Please paste the dependencies here

运行日志或截图 | Runtime logs or screenshots

# 请在此处粘贴运行日志
# Please paste the run log here

Answer 1 · 2023-11-24T08:05:11.000Z

https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/LLAMA_CPP.md

更新了一下，你可以测试有没有问题哈

Answer 2 · 2023-11-24T15:29:51.000Z

https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/LLAMA_CPP.md

更新了一下，你可以测试有没有问题哈

@xusenlinzy 非常感谢！我其实更建议llama.cpp，他们提供了api_like_OAI.py这个python接口。这个c版本的更稳定，我用python版本的会时不时报cuda 102，但是c版本的不会。当然了，依然非常感谢！

Answer 3 · 2024-01-10T10:57:49.000Z

https://github.com/xusenlinzy/api-for-open-llm/blob/master/docs/LLAMA_CPP.md

更新了一下，你可以测试有没有问题哈

可用, 但是速度很慢, 应该llama.cpp本身就慢