keldenl/gpt-llama.cpp

Duplication of capabilities?

das-sein opened this issue · 1 comments

Filing this issue primarily to make the developers aware of the existing llama-cpp-python web server that accomplishes the same thing and also has endpoint documentation with examples baked in:

https://abetlen.github.io/llama-cpp-python/#web-server

image

Hopefully this is useful or could reduce future development efforts in the attempt to address some of the other issues I'm seeing that request support for other popular open LLMs.

Yes, but I had some issues with llama.cpp-python, which is why I made gpt-llama.cpp in the first place.

Also, gpt-llama.cpp is more lightweight because it just utilizes what the user has locally for their llama.cpp project, so getting any new changes and improvements (like the recent CUDA and 5-bit support) is as easy as updating the llama.cpp project! gpt-llama.cpp doesn't have to do anything :)