Streaming responses time out

Question

Streaming responses time out

Opened this issue 9 months ago · 7 comments

Streaming responses can be currrently enabled using an environment variable ENABLE_STREAMING_RESPONSE.

The issue is that streaming responses eventually leads to a timeout error by telegram.

The current measure to counter this (streaming every 2 seconds) is inadequate.

Answer 1 · 2024-04-23T19:17:45.000Z

Ohh. It's rare reproduced with default MESSAGE_CHUNK_SIZE=5. If I set MESSAGE_CHUNK_SIZE equals 1 this issue repeated more often and opposite if I set 20 I don't see any timeout.

So I want to check maybe there is buffer overload in socket. Or it's really telegram respond so long.

Answer 2 · 2024-04-24T03:55:50.000Z

If setting it to 20 almost stops the timeouts, we might make it the default value of MESSAGE_CHUNK_SIZE.

Answer 3 · 2024-04-27T03:25:38.000Z

Hey @masalyuk , any progress?

Answer 4 · 2024-04-27T08:52:25.000Z

@tusharhero Sorry for delay. I want to check something today to find optimal size of MESSAGE_CHUNK_SIZE.

Answer 5 · 2024-04-27T18:27:26.000Z

@tusharhero Even with a MESSAGE_CHUNK_SIZE set to 10, I don't see any timeout, even if I've asked to write 50 sentences.

Moreover, increasing this parameter decreases the probability of encountering Flood control error or Bad Message error. Therefore, let's change it to 20 to ensure that we won't face such issues.

It is probably a good idea to place an additional warning somewhere to notify users not to use small MESSAGE_CHUNK_SIZE values.

Answer 6 · 2024-04-27T20:02:38.000Z

Hey @masalyuk,

Even with a MESSAGE_CHUNK_SIZE set to 10, I don't see any timeout, even if I've asked to write 50 sentences.

Interesting, I wonder how you are testing it. Because when I try to test it. It often stops in the middle of inference due to these errors.

Moreover, increasing this parameter decreases the probability of encountering Flood control error or Bad Message error. Therefore, let's change it to 20 to ensure that we won't face such issues.

I have also tried setting the values as high as 150. The error still persists. Maybe the current value is not being read from the configuration file. Can you investigate this?

It is probably a good idea to place an additional warning somewhere to notify users not to use small MESSAGE_CHUNK_SIZE values.

That sounds like a good idea. That should be mentioned in docs/setup.md.

Answer 7 · 2024-05-02T12:07:34.000Z

I think we may be able to take some inspiration from here:
code from ruecat/ollama-telegram.

        async for response_data in generate(payload, modelname, prompt):
            msg = response_data.get("message")
            if msg is None:
                continue
            chunk = msg.get("content", "")
            full_response += chunk
            full_response_stripped = full_response.strip()


            # avoid Bad Request: message text is empty
            if full_response_stripped == "":
                continue


            if "." in chunk or "\n" in chunk or "!" in chunk or "?" in chunk:
                if sent_message:
                    if last_sent_text != full_response_stripped:
                        await bot.edit_message_text(chat_id=message.chat.id, message_id=sent_message.message_id,
                                                    text=full_response_stripped)
                        last_sent_text = full_response_stripped
                else:
                    sent_message = await bot.send_message(
                        chat_id=message.chat.id,
                        text=full_response_stripped,
                        reply_to_message_id=message.message_id,
                    )
                    last_sent_text = full_response_stripped