Long generations are cut off in the webui
Opened this issue · 3 comments
with standard settings and both models I have - alpaca 7b and 13b-gpt-x longer generations are cut off in the webui.
after a while text stops appearing, the debug console shows only status messages, no more "polling" messages, but CPU usage stays up and the UI also shows the "stop generating" button
upon pressing that button a minute later, the console shows the much longer message (it was still generating), the message is not shown in the web interface
can you provide entire transcript
i suspect that this issue is related to windows somehow
actually i have pushed some changes to my fork of llama.cpp i think they might fix the issue
There is something weird... when post like a 'medium-size' text.
Immediately see this:
inp( #2) : inp( #2) : inp( #2) : inp( #2) : inp( #2) : inp( #2) : inp( #2)
then the answer...
answer for the question
followed by a loop
### Human: continue
### Assistant: Additionally, ...
I find a bit concerning this behaviour on many new models, where the stream gets unlimited by just these human/assistant loops..
This should be fixed in the latest release please check and confirm