Workaround hitting rate limits while polling Atlas API endpoints
prashantmital opened this issue · 1 comments
Atlas API resources are rate-limited on a per-project basis. Since each and every evergreen build of this project uses the same Atlas project, it is possible to run into API rate limits when multiple builds are running simultaneously.
In the absence of a backoff/retry logic, hitting the rate limit results in the entire test run failing with a message like:
INFO:astrolabe.runner:Initializing cluster '420b243009'
INFO:astrolabe.runner:Waiting for a test cluster to become ready
Traceback (most recent call last):
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/util/connection.py", line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 976, in _validate_conn
conn.connect()
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connection.py", line 308, in connect
conn = self._new_conn()
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cloud.mongodb.com', port=443): Max retries exceeded with url: /api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/atlasclient/client.py", line 210, in request
response = requests.request(method, url, **request_kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cloud.mongodb.com', port=443): Max retries exceeded with url: /api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "astrolabevenv/bin/astrolabe", line 33, in <module>
sys.exit(load_entry_point('astrolabe', 'console_scripts', 'astrolabe')())
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/cli.py", line 441, in run_single_test
failed = runner.run()
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/runner.py", line 323, in run
args=("IDLE",), kwargs={})
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/poller.py", line 50, in poll
return_value = self._check_ready(obj, attribute, args, kwargs)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/poller.py", line 67, in _check_ready
return bool(getattr(obj, attribute)(*args, **kwargs))
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/runner.py", line 91, in is_cluster_state
cluster_info = self.cluster_url.get().data
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/atlasclient/client.py", line 56, in get
return self._client.request('GET', self._path, **params)
File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/atlasclient/client.py", line 215, in request
request_method=method
atlasclient.exceptions.AtlasClientError: HTTPSConnectionPool(host='cloud.mongodb.com', port=443): Max retries exceeded with url: /api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) (GET https://cloud.mongodb.com/api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009)
Command failed: error waiting on process '2d258a64-438f-4981-99f6-7403862b2caa': exit status 1
We should improve astrolabe to account for this failure mode and appropriately wait/backoff when such errors are encountered.
Note that we poll the Atlas API for many purposes in astrolabe
. The most common endpoint that is polled is https://docs.atlas.mongodb.com/reference/api/clusters-get-one/ and we use the output to determine cluster status (specifically, we glean provisioning status, maintenance status etc using clusterState
).