Long generations are cut off in the webui

Question

Long generations are cut off in the webui

Opened this issue 2 years ago · 3 comments

with standard settings and both models I have - alpaca 7b and 13b-gpt-x longer generations are cut off in the webui.

after a while text stops appearing, the debug console shows only status messages, no more "polling" messages, but CPU usage stays up and the UI also shows the "stop generating" button

upon pressing that button a minute later, the console shows the much longer message (it was still generating), the message is not shown in the web interface

Answer 1 · 2023-04-05T19:59:31.000Z

can you provide entire transcript

i suspect that this issue is related to windows somehow

actually i have pushed some changes to my fork of llama.cpp i think they might fix the issue

Answer 2 · 2023-04-09T08:52:39.000Z

There is something weird... when post like a 'medium-size' text.

Immediately see this:

inp( #2) :  inp( #2) :  inp( #2) :  inp( #2) :  inp( #2) :  inp( #2) :  inp( #2)

then the answer...

answer for the question

followed by a loop

### Human: continue
### Assistant: Additionally, ...

I find a bit concerning this behaviour on many new models, where the stream gets unlimited by just these human/assistant loops..

Answer 3 · 2023-04-14T14:46:42.000Z

This should be fixed in the latest release please check and confirm