Duplication of capabilities?

Question

Duplication of capabilities?

das-sein opened this issue 3 years ago · 1 comments

Filing this issue primarily to make the developers aware of the existing llama-cpp-python web server that accomplishes the same thing and also has endpoint documentation with examples baked in:

https://abetlen.github.io/llama-cpp-python/#web-server

Hopefully this is useful or could reduce future development efforts in the attempt to address some of the other issues I'm seeing that request support for other popular open LLMs.

Answer 1 · 2023-05-01T02:29:15.000Z

Yes, but I had some issues with llama.cpp-python, which is why I made gpt-llama.cpp in the first place.

Also, gpt-llama.cpp is more lightweight because it just utilizes what the user has locally for their llama.cpp project, so getting any new changes and improvements (like the recent CUDA and 5-bit support) is as easy as updating the llama.cpp project! gpt-llama.cpp doesn't have to do anything :)