Randomly getting truncated output from client.run() for streaming models using replicate 0.24.0

Question

Randomly getting truncated output from client.run() for streaming models using replicate 0.24.0

beatty opened this issue 4 months ago · 1 comments

With replicate 0.24.0 Python client and "mistralai/mistral-7b-instruct-v0.2" (which is a model that supports streaming), the iterator I get back from client.run() is truncating output frequently, perhaps 1/50 times. I checked on the Replicate dashboard at the request IDs I'm seeing truncation with and observe that the full response was recorded for all of them.

This behavior was consistent with replicate 0.23 as well.

I looked for easy workarounds and couldn't find any (can I disable streaming?).

Answer 1 · 2024-03-05T00:06:12.000Z

Hi @beatty. Thanks for letting us know. I'll take a look to see what's going on. Can you share an ID of a prediction that had truncated output?

The run method doesn't actually use the streaming interface. Instead, it conditionally returns an iterator over the list of tokens once the prediction finishes. If you're seeing this behavior consistently, you might try calling the stream method instead.