Account for `retry-after-ms` header
Opened this issue · 0 comments
Presently, the module only accounts for the retry-after
header containing a value in seconds. Azure OpenAI Provisioned Throughput (PTU) deployments also pass back the retry-after-ms
header containing a value in milliseconds. This value may be preferential to seconds.
From the PTU documentation:
*A 429 response indicates that the allocated PTUs are fully consumed at the time of the call. The response includes the retry-after-ms
and retry-after headers
that tell you the time to wait before the next call will be accepted. *
As such, there is only a longer delay when not using retry-after-ms
when there is simply no PTU left to consume (overall and/or tokens-per-minute?). This may be an acceptable situation that may not surface too often.
One approach may be to convert everything internal to the module to milliseconds. These resources provide the starting point for the enhancement: