Operator fails to down-scale cluster because of "asynchronous connection attempt underway"
chaudum opened this issue · 4 comments
I attempted to down-scaling a 5node cluster to 3 nodes.
The cluster was empty (no tables).
[2020-11-03 10:52:51,596] kopf.objects [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:54:51,660] kopf.objects [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:56:51,713] kopf.objects [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 10:58:51,766] kopf.objects [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:00:51,865] kopf.objects [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:02:51,923] kopf.objects [WARNING ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to connect to cluster
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:04:52,035] kopf.objects [ERROR ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Failed to scale cluster
Traceback (most recent call last):
File "main.py", line 472, in cluster_update
await with_timeout(
File "main.py", line 107, in with_timeout
await awaitable
File "/usr/local/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
return fut.result()
File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 609, in scale_cluster
total_nodes = await scale_cluster_data_nodes(
File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 377, in scale_cluster_data_nodes
new_total_nodes = await scale_up_statefulset(
File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 226, in scale_up_statefulset
await wait_for_healthy_cluster(conn_factory, new_total_nodes, logger)
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 189, in wait_for_healthy_cluster
raise e
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
[2020-11-03 11:04:52,037] kopf.objects [INFO ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] POSTing to https://bregenz.a1.cratedb.cloud/mgmt/v1/clusters/operator/ because of WebhookEvent.SCALE on cluster 60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402
[2020-11-03 11:04:58,317] kopf.objects [INFO ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Successfully POSTed to https://bregenz.a1.cratedb.cloud/mgmt/v1/clusters/operator/ because of WebhookEvent.SCALE on cluster 60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402. Status: 200, Body: '{"success":true}\n'
[2020-11-03 11:04:58,319] kopf.objects [ERROR ] [60ca4860-05c4-46e2-a7f8-3a90c260f8a1/fdc04dad-52f1-4c1e-b84d-d730a7012402] Handler 'cluster_update' failed with an exception. Will retry.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/kopf/reactor/handling.py", line 259, in execute_handler_once
result = await invoke_handler(
File "/usr/local/lib/python3.8/site-packages/kopf/reactor/handling.py", line 358, in invoke_handler
result = await invocation.invoke(
File "/usr/local/lib/python3.8/site-packages/kopf/reactor/invocation.py", line 125, in invoke
result = await fn(*args, **kwargs) # type: ignore
File "main.py", line 472, in cluster_update
await with_timeout(
File "main.py", line 107, in with_timeout
await awaitable
File "/usr/local/lib/python3.8/asyncio/tasks.py", line 491, in wait_for
return fut.result()
File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 609, in scale_cluster
total_nodes = await scale_cluster_data_nodes(
File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 377, in scale_cluster_data_nodes
new_total_nodes = await scale_up_statefulset(
File "/usr/local/lib/python3.8/site-packages/crate/operator/scale.py", line 226, in scale_up_statefulset
await wait_for_healthy_cluster(conn_factory, new_total_nodes, logger)
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 189, in wait_for_healthy_cluster
raise e
File "/usr/local/lib/python3.8/site-packages/crate/operator/cratedb.py", line 148, in wait_for_healthy_cluster
async with connection_factory() as conn:
File "/usr/local/lib/python3.8/site-packages/aiopg/utils.py", line 94, in __aenter__
self._obj = await self._coro
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 551, in _connect
await self._poll(self._waiter, self._timeout)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 209, in _poll
await asyncio.shield(cancel(), loop=self._loop)
File "/usr/local/lib/python3.8/site-packages/aiopg/connection.py", line 195, in cancel
self._conn.cancel()
psycopg2.OperationalError: asynchronous connection attempt underway
...
This could be related to aio-libs/aiopg#275
This could be related to aio-libs/aiopg#275
🤔 I was once visiting this issue already. But cannot remember the context.
Replacing localhost
with 127.0.0.1
would at least mitigate that issue, even it the root cause of this issue is a different one.
This could be related to aio-libs/aiopg#275
thinking I was once visiting this issue already. But cannot remember the context.
Replacing
localhost
with127.0.0.1
would at least mitigate that issue, even it the root cause of this issue is a different one.
We don't use localhost
anywhere, though (only when using the crash
command). However, we have the same conditions described in aio-libs/aiopg#579
I don't think we've seen this happen for a long time, closing due to inactivity.