Streaming responses time out
Opened this issue · 7 comments
Streaming responses can be currrently enabled using an environment variable ENABLE_STREAMING_RESPONSE
.
The issue is that streaming responses eventually leads to a timeout error by telegram.
The current measure to counter this (streaming every 2 seconds) is inadequate.
Ohh. It's rare reproduced with default MESSAGE_CHUNK_SIZE=5
. If I set MESSAGE_CHUNK_SIZE
equals 1 this issue repeated more often and opposite if I set 20 I don't see any timeout.
So I want to check maybe there is buffer overload in socket. Or it's really telegram respond so long.
If setting it to 20 almost stops the timeouts, we might make it the default value of MESSAGE_CHUNK_SIZE
.
Hey @masalyuk , any progress?
@tusharhero Sorry for delay. I want to check something today to find optimal size of MESSAGE_CHUNK_SIZE
.
@tusharhero Even with a MESSAGE_CHUNK_SIZE
set to 10, I don't see any timeout, even if I've asked to write 50 sentences.
Moreover, increasing this parameter decreases the probability of encountering Flood control error or Bad Message error. Therefore, let's change it to 20 to ensure that we won't face such issues.
It is probably a good idea to place an additional warning somewhere to notify users not to use small MESSAGE_CHUNK_SIZE
values.
Hey @masalyuk,
Even with a MESSAGE_CHUNK_SIZE set to 10, I don't see any timeout, even if I've asked to write 50 sentences.
Interesting, I wonder how you are testing it. Because when I try to test it. It often stops in the middle of inference due to these errors.
Moreover, increasing this parameter decreases the probability of encountering Flood control error or Bad Message error. Therefore, let's change it to 20 to ensure that we won't face such issues.
I have also tried setting the values as high as 150. The error still persists. Maybe the current value is not being read from the configuration file. Can you investigate this?
It is probably a good idea to place an additional warning somewhere to notify users not to use small MESSAGE_CHUNK_SIZE values.
That sounds like a good idea. That should be mentioned in docs/setup.md
.
I think we may be able to take some inspiration from here:
code from ruecat/ollama-telegram
.
async for response_data in generate(payload, modelname, prompt):
msg = response_data.get("message")
if msg is None:
continue
chunk = msg.get("content", "")
full_response += chunk
full_response_stripped = full_response.strip()
# avoid Bad Request: message text is empty
if full_response_stripped == "":
continue
if "." in chunk or "\n" in chunk or "!" in chunk or "?" in chunk:
if sent_message:
if last_sent_text != full_response_stripped:
await bot.edit_message_text(chat_id=message.chat.id, message_id=sent_message.message_id,
text=full_response_stripped)
last_sent_text = full_response_stripped
else:
sent_message = await bot.send_message(
chat_id=message.chat.id,
text=full_response_stripped,
reply_to_message_id=message.message_id,
)
last_sent_text = full_response_stripped