[FEATURE REQ] Increase client-side timeout values and / or implement exponential-backoff retry logic for HTTP 408/429 errors.
Closed this issue · 3 comments
Describe the feature or improvement you are requesting
Ask - Update all Azure OpenAI SDKs, quick-start samples, and portal-generated code snippets to 1) set a default request timeout of 300s or 600s, 2) include built-in exponential-backoff retries for 408/429/500/503 with honor-Retry-After behavior, and 3) emit warnings when client-supplied timeouts are below 120 s. This prevents premature disconnects and improves the service reliability for customers using the SDK automatically.
Additional context
No response
Hi @jtanner-msft. Thank you for reaching out and for your suggestion. This opens an interesting line of inquiry about the right developer experience for long-running operations such as this. Generally, it is not desirable to have an HTTP request execute for 5-10 minutes with no feedback, as it opens the question "how do I know if this is working or has hung." We're looking into patterns that would potentially provide a more responsive experience around these so that an application receives feedback.
Your suggestion for adding context to samples to demonstrate setting a higher network timeout is a good one. We'll take a look at doing that in the short term.
Also, @jtanner-msft, I am not sure if you are aware, but you can change the timoute yourself. It's configurable using OpennAIClientOptions that can be passed to the client's constructors.
Also, the library does already honor the retry after headers: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/core/System.ClientModel/src/Pipeline/ClientRetryPolicy.cs#L300
Closing for now. We changed the error message of the exception thrown when a timeout happens to point users to documentation on how to adjust the timeout. Let's see if this fixes/alleviates the issue. If we still get feedback from users that it's not enough, we can reconsider adjusting the default timeout. But I don't want to do it prematurely as extending the timeout has negatives too.