mongodb-labs/drivers-atlas-testing

Workaround hitting rate limits while polling Atlas API endpoints

prashantmital opened this issue · 1 comments

Atlas API resources are rate-limited on a per-project basis. Since each and every evergreen build of this project uses the same Atlas project, it is possible to run into API rate limits when multiple builds are running simultaneously.

In the absence of a backoff/retry logic, hitting the rate limit results in the entire test run failing with a message like:

INFO:astrolabe.runner:Initializing cluster '420b243009'
INFO:astrolabe.runner:Waiting for a test cluster to become ready
Traceback (most recent call last):
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/util/connection.py", line 61, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 976, in _validate_conn
    conn.connect()
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connection.py", line 308, in connect
    conn = self._new_conn()
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='cloud.mongodb.com', port=443): Max retries exceeded with url: /api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/atlasclient/client.py", line 210, in request
    response = requests.request(method, url, **request_kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cloud.mongodb.com', port=443): Max retries exceeded with url: /api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "astrolabevenv/bin/astrolabe", line 33, in <module>
    sys.exit(load_entry_point('astrolabe', 'console_scripts', 'astrolabe')())
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabevenv/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/cli.py", line 441, in run_single_test
    failed = runner.run()
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/runner.py", line 323, in run
    args=("IDLE",), kwargs={})
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/poller.py", line 50, in poll
    return_value = self._check_ready(obj, attribute, args, kwargs)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/poller.py", line 67, in _check_ready
    return bool(getattr(obj, attribute)(*args, **kwargs))
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/astrolabe/runner.py", line 91, in is_cluster_state
    cluster_info = self.cluster_url.get().data
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/atlasclient/client.py", line 56, in get
    return self._client.request('GET', self._path, **params)
  File "/data/mci/14a1c0a1a91704cf6d127b1cc65cab0e/astrolabe-src/atlasclient/client.py", line 215, in request
    request_method=method
atlasclient.exceptions.AtlasClientError: HTTPSConnectionPool(host='cloud.mongodb.com', port=443): Max retries exceeded with url: /api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fefa834e9e8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) (GET https://cloud.mongodb.com/api/atlas/v1.0/groups/5e8e3954fd6ba4520d3f1bbe/clusters/420b243009)
Command failed: error waiting on process '2d258a64-438f-4981-99f6-7403862b2caa': exit status 1

We should improve astrolabe to account for this failure mode and appropriately wait/backoff when such errors are encountered.

Note that we poll the Atlas API for many purposes in astrolabe. The most common endpoint that is polled is https://docs.atlas.mongodb.com/reference/api/clusters-get-one/ and we use the output to determine cluster status (specifically, we glean provisioning status, maintenance status etc using clusterState).