Retry on timeouts and connection errors
Opened this issue · 1 comments
I'm harvesting from a server which frequently times out (requests.exceptions.Timeout
). Then, the request is not retried even though I set max_retries
, since the retry functionality only covers the case where you actually get a response from the server.
I would like to extend the retry functionality to also include timeouts, but rather than increasing the complexity of the _request method further, I think it's worth considering switching to the tested and tried Retry
urllib3 class. For some background on the class, see https://kevin.burke.dev/kevin/urllib3-retries/
Retry
also handles the Retry-After
header, so it shouldn't be that different from the current behaviour. The main difference is that it uses a backoff factor instead of a fixed sleep time:
sleep_time = backoff_factor * (2**retry_number)
Since OAI-PMH servers can be quite slow, we could set the default backoff factor to something like 2, to make the sleep time increase quickly. It is capped to BACKOFF_MAX=120 seconds by default
>>> for x in range(2,10):
>>> print('Retry %s : sleep time %.1f seconds' % (x, min(120, 2 * (2**x))))
Retry 2 : sleep time 8.0 seconds
Retry 3 : sleep time 16.0 seconds
Retry 4 : sleep time 32.0 seconds
Retry 5 : sleep time 64.0 seconds
Retry 6 : sleep time 120.0 seconds
Retry 7 : sleep time 120.0 seconds
Retry 8 : sleep time 120.0 seconds
Retry 9 : sleep time 120.0 seconds
Breaking change: This means that the default_retry_after
argument would no longer be supported.
Let me know what you think, and whether there is a chance a PR for this would be accepted.