No retries on non-HTTP network issues
Closed this issue · 3 comments
When an HTTP error occurs, e.g. 429, then everything is handled as expected, triggering retries etc. and using the retry options of the connection policy of pydocumentdb.
However, if there is a network error such that a server is not reachable, then this results in an immediate exception without retries. This is because of two things:
-
pydocumentdb's retry_utility code only handles
errors.HTTPFailure
errors, which are HTTP errors corresponding to certain HTTP status codes, e.g. 429: https://github.com/Azure/azure-documentdb-python/blob/07e2f3f93ad5abeb114c2d2f83577c25d18f0bb4/pydocumentdb/retry_utility.py#L66 -
pydocumentdb uses
requests
to do the actual network requests, however it sets up the requests session with the defaults only which doesn't enable retrying: https://github.com/Azure/azure-documentdb-python/blob/07e2f3f93ad5abeb114c2d2f83577c25d18f0bb4/pydocumentdb/document_client.py#L134 This then in turn leads to the underlying urllib3 not to retry such requests: https://github.com/urllib3/urllib3/blob/1.19.1/urllib3/util/retry.py#L331-L336 (read
would be false with default options).
An approach as described in https://www.peterbe.com/plog/best-practice-with-retries-with-requests is typically used to enable retries with requests
:
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def requests_retry_session(
retries=3,
backoff_factor=0.3,
status_forcelist=(500, 502, 504),
session=None,
):
session = session or requests.Session()
retry = Retry(
total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist,
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
return session
# Usage example...
response = requests_retry_session().get('https://www.peterbe.com/')
print(response.status_code)
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
response = requests_retry_session(session=s).get(
'https://www.peterbe.com'
)
The following is an exception trace resulting from trying to create a document in CosmosDB when the server is unreachable:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 384, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse()
File "C:\Python36\lib\http\client.py", line 1331, in getresponse
response.begin()
File "C:\Python36\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "C:\Python36\lib\http\client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Python36\lib\socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "C:\Python36\lib\ssl.py", line 1009, in recv_into
return self.read(nbytes, buffer)
File "C:\Python36\lib\ssl.py", line 871, in read
return self._sslobj.read(len, buffer)
File "C:\Python36\lib\ssl.py", line 631, in read
v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python36\lib\site-packages\requests\adapters.py", line 445, in send
timeout=timeout
File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Python36\lib\site-packages\urllib3\util\retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "C:\Python36\lib\site-packages\urllib3\packages\six.py", line 686, in reraise
raise value
File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "C:\Python36\lib\site-packages\urllib3\connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='....documents.azure.com', port=443): Read timed out. (read timeout=60.0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
File "C:\Python36\lib\site-packages\pydocumentdb\document_client.py", line 947, in CreateDocument
options)
File "C:\Python36\lib\site-packages\pydocumentdb\document_client.py", line 2365, in Create
headers)
File "C:\Python36\lib\site-packages\pydocumentdb\document_client.py", line 2571, in __Post
headers=headers)
File "C:\Python36\lib\site-packages\pydocumentdb\synchronized_request.py", line 212, in SynchronizedRequest
return retry_utility._Execute(client, global_endpoint_manager, _Request, connection_policy, requests_session, resource_url, request_options, request_body)
File "C:\Python36\lib\site-packages\pydocumentdb\retry_utility.py", line 56, in _Execute
result = _ExecuteFunction(function, *args, **kwargs)
File "C:\Python36\lib\site-packages\pydocumentdb\retry_utility.py", line 92, in _ExecuteFunction
return function(*args, **kwargs)
File "C:\Python36\lib\site-packages\pydocumentdb\synchronized_request.py", line 127, in _Request
verify = is_ssl_enabled)
File "C:\Python36\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python36\lib\site-packages\requests\sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "C:\Python36\lib\site-packages\requests\adapters.py", line 526, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='....documents.azure.com', port=443): Read timed out. (read timeout=60.0)
We are also encountering this issue. The default requests is currently set to 60 and there is no way to change this or pass in retry logic.
Hi @srinathnarayanan can you take a look at this issue? It is currently affecting our application in production. I can provide more info... I looked at the next version at https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/cosmos/azure-cosmos/azure/cosmos/_cosmos_client_connection.py and I don't see that this has been resolved there either. So probably this issue needs to be taken into the next version.
The Fix is present in V 3.1.2 and V4.0.0b4 of azure-cosmos