replicate/replicate-python

documentation: error handling

RichardNeill opened this issue · 1 comments

It would be really helpful if there was a documented example on how to catch common error-cases.
Might I request a documentation example, along the following lines:

for event in replicate.stream(  LLM_MODEL,  ... ):
    print(str(event), end="")    
elif (  #please document what goes here ):
    print ("Error, timeout occurred after 34 seconds")
elif (  #please document what goes here ):
    print ("Error: too many input tokens: you supplied 5034 when 4096 is the maximum")
elif ( #please document what goes here ):
    print ("Error: output was truncated because we exceeded the context length before completion")

At the moment, all I can get is a crash with a runtime error (if tokens are exceeded), and complete silence if the timeout occurs.

Relatedly, is there any way to get the measured input and output token count?
I'm currently working with the approximation:
tokens = round((len(str.split()) * 4/3)).

Thanks for your time.

Hi @RichardNeill. Thanks for opening this issue. I'm happy to report that I just merged #263, which should help make API errors more understandable and actionable.

import replicate
from replicate.exceptions import ReplicateError

try:
    output = replicate.run(
        "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b",
        input={"prompt": "a photo of an astronaut riding a horse on Mars"},
    )
    print(output)
except ReplicateError as e:
    print(f"An error occurred: {e.status} - {e.detail}")

But I agree that there's more we can do to document this.

Relatedly, is there any way to get the measured input and output token count?

Yes. For models like meta/llama-2-70b-chat that support per-token pricing, you can get input and output token counts, as well as time to first token and tokens per second in the prediction metrics field.