how to generate a llama.cpp server with fastchat api
xx-zhang opened this issue · 1 comments
xx-zhang commented
I have set the server , but only few words output like blocked , and is a single progress which can't reponse fastly. it is only run and load model when the request is getting.
fredi-python commented
How did you setup the server?