crate/crate-operator

Operator fails to down-scale cluster because of "asynchronous connection attempt underway"

chaudum opened this issue · 4 comments

I attempted to down-scaling a 5node cluster to 3 nodes.
The cluster was empty (no tables).

[2020-11-03 10:52:51,596] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:54:51,660] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:56:51,713] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:58:51,766] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:00:51,865] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:02:51,923] kopf.objects         [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:04:52,035] kopf.objects         [ERROR   ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to scale cluster
Traceback (most recent call last):
  File "main.py", line 472, in cluster_update
    await with_timeout(
  File "main.py", line 107, in with_timeout
    await awaitable
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 609, in scale_cluster
    total_nodes = await scale_cluster_data_nodes(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 377, in scale_cluster_data_nodes
    new_total_nodes = await scale_up_statefulset(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 226, in scale_up_statefulset
    await wait_for_healthy_cluster(conn_factory, new_total_nodes, logger)
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 189, in wait_for_healthy_cluster
    raise e
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:04:52,037] kopf.objects         [INFO    ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] POSTing to https://bregenz.a1.cratedb.cloud/mgmt/v1/clusters/operator/ because of WebhookEvent.SCALE on cluster 60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402
[2020-11-03 11:04:58,317] kopf.objects         [INFO    ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Successfully POSTed to https://bregenz.a1.cratedb.cloud/mgmt/v1/clusters/operator/ because of WebhookEvent.SCALE on cluster 60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402. Status: 200, Body: '{"success":true}\n'
[2020-11-03 11:04:58,319] kopf.objects         [ERROR   ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Handler 'cluster_update' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/kopf/reactor/handling.py", line 259, in execute_handler_once
    result = await invoke_handler(
  File "/usr/local/lib/python3.8/site-packages/kopf/reactor/handling.py", line 358, in invoke_handler
    result = await invocation.invoke(
  File "/usr/local/lib/python3.8/site-packages/kopf/reactor/invocation.py", line 125, in invoke
    result = await fn(*args, **kwargs)  # type: ignore
  File "main.py", line 472, in cluster_update
    await with_timeout(
  File "main.py", line 107, in with_timeout
    await awaitable
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 609, in scale_cluster
    total_nodes = await scale_cluster_data_nodes(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 377, in scale_cluster_data_nodes
    new_total_nodes = await scale_up_statefulset(
  File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 226, in scale_up_statefulset
    await wait_for_healthy_cluster(conn_factory, new_total_nodes, logger)
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 189, in wait_for_healthy_cluster
    raise e
  File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
    async with connection_factory() as conn:
  File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
    self._obj = await self._coro
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
    await self._poll(self._waiter, self._timeout)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
    await asyncio.shield(cancel(), loop=self._loop)
  File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
    self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
...

This could be related to aio-libs/aiopg#275

This could be related to aio-libs/aiopg#275

🤔 I was once visiting this issue already. But cannot remember the context.

Replacing localhost with 127.0.0.1 would at least mitigate that issue, even it the root cause of this issue is a different one.

This could be related to aio-libs/aiopg#275

thinking I was once visiting this issue already. But cannot remember the context.

Replacing localhost with 127.0.0.1 would at least mitigate that issue, even it the root cause of this issue is a different one.

We don't use localhost anywhere, though (only when using the crash command). However, we have the same conditions described in aio-libs/aiopg#579

I don't think we've seen this happen for a long time, closing due to inactivity.