
The `send_message_streaming` function does not raise an error when the maximum context length is exceeded.

Closed this issue · 2 comments


If a message exceeds the maximum allowed context length for the conversation, the standard send_message function will raise an error, e.g.,:

Error: BackendError { message: "This model's maximum context length is 4097 tokens. However, your messages resulted in 5000 tokens. Please reduce the length of the messages.", error_type: "invalid_request_error" }

However, when the same message is sent using the send_message_streaming function, no error is raised. Instead, the conversation becomes permanently stuck.

Is there a way to work around this issue?


Does the underlying API even return any errors when streaming a response? It seems that whenever any error occurs, the server just stops sending message parts whatsoever.

It seems that whenever the message is too long the server answer with a status=400, that could be intercepted before calling bytes_stream().

        let response_stream = self
            .json(&CompletionRequest {
                model: self.config.engine.as_ref(),
                stream: true,
                messages: history,
                temperature: self.config.temperature,
                top_p: self.config.top_p,
                frequency_penalty: self.config.frequency_penalty,
                presence_penalty: self.config.presence_penalty,
                reply_count: self.config.reply_count,
        if response_stream.status() == 400 {
            return Err(crate::err::Error::ParsingError("Is your message too large?".to_string()));
        let response_stream = response_stream

Original code at

(Note: In this code snippet, I utilized the existing ParsingError because it allowed me to pass a string. However, it might be more appropriate to define a new error specifically for this scenario.)