Problems using Configurable Retry Policy
Closed this issue · 5 comments
Looking at the documentation on Configurable Retry Policy Support:
retry_count
is documented as "Number of times a request is tried upon encountering a specific response code, such as 429." however I don't think the implementation respects this -- I believe it will make a total ofretry_count
attempts across any error type.retry_delay
andretry_max_delay
documentation does not make it clear that these parameters control the full jitter algorithm. Also, they probably don't need to be integers.- In particular,
retry_max_delay
is documented as "The upper limit for the amount of time, in seconds, for which the retries will be attempted for." which is not respected, but would be useful to have. (i.e. I think you should addtenacity.stop_after_delay(configuration.retry_max_delay)
to theRetrying(stop=...)
argument, so that there is a way to limit the total amount of time spent retrying.)
Separately, it doesn't look like metrics are getting published during retries because opsgenie_sdk.rest.RESTClientObject.request()
does not publish HTTP Metrics in the scenario where pool_manager.request()
raises an exception. This is making it harder for me to debug what is going on.
Additional observations:
- The
Configuration
object has an undocumentedretries
argument which (I think) is unused, but comes up in intellisense auto-complete lists. - The
back_off
parameter is not prefixed withretry_
like all the rest of the parameters for retry logic.retry_back_off
might be more intuitive (and easier to find in intellisense auto-complete lists)
- It's not possible to disable retry logic by setting
retry_enabled = False
because this parameter is quietly being ignored due to a bug. Theand
below needs to be an&
for this to work as intended.
opsgenie-python-sdk/opsgenie_sdk/api_client.py
Lines 84 to 86 in f2a2103
Hi, thanks a bunch for pointing out all of this and sorry for the late response.
I have changed the explanations of the retry_count
and retry_delay
in the documentation to better align with what they actually do.
The bug which was preventing the disablement of retrying will be fixed, by implementing a custom callback function for deciding whether the retry should stop or not, when #29 is released.
It looks like that Tenacity doesn't support and
or &
operator in the retry keyword argument because when we changed to &
from and
, the retrying was stopped but wasn't working with it enabled.
I will remove the unused retries
parameter in the Configuration
object in 17f99e2
I agree that the back_off
parameter of the Configuration
object should be renamed to retry_back_off
, but doing so might introduce some breaking changes for some customers. So, I will be postponing it to the next major release perhaps.
And I am looking into why the metrics are not being published when pool_manager.request()
raises an exception at the moment
Coming back to metrics' publishing, http metrics are published only after an http request is completed. However, to make debugging easier, I have added a change to publish sdk metrics whenever an exception is thrown while attempting to call the api in b382348
I'm closing the ticket now. Thanks for the collaborative work guys. If we need an another improvement later, let's create a new issue then.