Unstable test - TestTestgresCommon::test_pg_ctl_wait_option[remote]
Closed this issue · 5 comments
dmitry-lipetsk commented
https://app.travis-ci.com/github/postgrespro/testgres/jobs/631955014
I think, we should wait for creation of postmaster.pid and after this begin stopping our node.
______________ TestTestgresCommon.test_pg_ctl_wait_option[remote] ______________
[gw1] linux -- Python 3.12.3 /tmp/testgres_venv/bin/python
self = <testgres.tests.test_testgres_common.TestTestgresCommon object at 0x7efe130f0470>
node_svc = <testgres.tests.helpers.global_data.PostgresNodeService object at 0x7efe13e72780>
def test_pg_ctl_wait_option(self, node_svc: PostgresNodeService):
assert isinstance(node_svc, PostgresNodeService)
C_MAX_ATTEMPTS = 50
node = __class__.helper__get_node(node_svc)
assert node.status() == NodeStatus.Uninitialized
node.init()
assert node.status() == NodeStatus.Stopped
node.start(wait=False)
nAttempt = 0
while True:
if nAttempt == C_MAX_ATTEMPTS:
#
# [2025-03-11]
# We have an unexpected problem with this test in CI
# Let's get an additional information about this test failure.
#
logging.error("Node was not stopped.")
if not node.os_ops.path_exists(node.pg_log_file):
logging.warning("Node log does not exist.")
else:
logging.info("Let's read node log file [{0}]".format(node.pg_log_file))
logFileData = node.os_ops.read(node.pg_log_file, binary=False)
logging.info("Node log file content:\n{0}".format(logFileData))
> raise Exception("Could not stop node.")
E Exception: Could not stop node.
C_MAX_ATTEMPTS = 50
__class__ = <class 'testgres.tests.test_testgres_common.TestTestgresCommon'>
logFileData = '2025-04-05 19:38:19.256 GMT [29667] LOG: starting PostgreSQL 17.4 (Ubuntu 17.4-1.pgdg24.04+2) on x86_64-pc-linux-gnu... FATAL: could not create any TCP/IP sockets\n2025-04-05 19:38:19.261 GMT [29667] LOG: database system is shut down\n'
nAttempt = 50
node = PostgresNode(name='testgres-2811a483-a174-42ac-ba1d-f27e23be4387', port=57388, base_dir='/tmp/tgsn_RcX1GQ')
node_svc = <testgres.tests.helpers.global_data.PostgresNodeService object at 0x7efe13e72780>
self = <testgres.tests.test_testgres_common.TestTestgresCommon object at 0x7efe130f0470>
tests/test_testgres_common.py:801: Exception
------------------------------ Captured log call -------------------------------
INFO root:test_testgres_common.py:810 Try to stop node. Attempt #1.
INFO root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO root:test_testgres_common.py:806 Wait 1 second.
INFO root:test_testgres_common.py:808
INFO root:test_testgres_common.py:810 Try to stop node. Attempt #2.
INFO root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO root:test_testgres_common.py:806 Wait 1 second.
INFO root:test_testgres_common.py:808
INFO root:test_testgres_common.py:810 Try to stop node. Attempt #3.
INFO root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO root:test_testgres_common.py:806 Wait 1 second.
INFO root:test_testgres_common.py:808
INFO root:test_testgres_common.py:810 Try to stop node. Attempt #4.
INFO root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO root:test_testgres_common.py:806 Wait 1 second.
INFO root:test_testgres_common.py:808
INFO root:test_testgres_common.py:810 Try to stop node. Attempt #5.
INFO root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
.....
dmitry-lipetsk commented
I understood.
https://app.travis-ci.com/github/postgrespro/testgres/jobs/632045028
- We start node without wait
- Node can't start because it has a conflict with another (parallel) instance.
- We can't help him
---- Error:
pg_ctl: PID file "/tmp/tgsn_iMofAp/data/postmaster.pid" does not exist
Is server running?
ERROR root:conftest.py:626 Node was not stopped.
INFO root:test_testgres_common.py:822 Let's read node log file [/tmp/tgsn_iMofAp/logs/postgresql.log]
INFO root:test_testgres_common.py:824 Node log file content:
2025-04-09 11:31:38.253 GMT [27196] LOG: starting PostgreSQL 17.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 14.2.0) 14.2.0, 64-bit
2025-04-09 11:31:38.253 GMT [27196] LOG: could not bind IPv4 address "127.0.0.1": Address in use
2025-04-09 11:31:38.253 GMT [27196] HINT: Is another postmaster already running on port 54302? If not, wait a few seconds and retry.
2025-04-09 11:31:38.253 GMT [27196] WARNING: could not create listen socket for "127.0.0.1"
2025-04-09 11:31:38.253 GMT [27196] FATAL: could not create any TCP/IP sockets
2025-04-09 11:31:38.255 GMT [27196] LOG: database system is shut down
=========================== short test summary info ============================
FAILED tests/test_testgres_common.py::TestTestgresCommon::test_pg_ctl_wait_option[remote] - Exception: Could not stop node.
======= 1 failed, 265 passed, 4 skipped, 1 xfailed in 638.01s (0:10:38) ========
Command exited with non-zero status 1
real 10m 38.50s
user 12m 27.05s
sys 1m 36.15s
The command "docker run $(bash <(curl -s https://codecov.io/env)) -t tests" exited with 1.
Solution
Port manager should use global (host wide) locks for reserved ports.
dmitry-lipetsk commented
https://app.travis-ci.com/github/postgrespro/testgres/jobs/632597068
ERROR root:conftest.py:644 Node was not stopped.
INFO root:test_testgres_common.py:915 Let's read node log file [/tmp/tgsn_EMLgnh/logs/postgresql.log]
INFO root:test_testgres_common.py:917 Node log file content:
2025-05-04 20:40:53.490 GMT [28164] LOG: starting PostgreSQL 17.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 14.2.0) 14.2.0, 64-bit
2025-05-04 20:40:53.490 GMT [28164] LOG: could not bind IPv4 address "127.0.0.1": Address in use
2025-05-04 20:40:53.490 GMT [28164] HINT: Is another postmaster already running on port 58514? If not, wait a few seconds and retry.
2025-05-04 20:40:53.491 GMT [28164] WARNING: could not create listen socket for "127.0.0.1"
2025-05-04 20:40:53.491 GMT [28164] FATAL: could not create any TCP/IP sockets
2025-05-04 20:40:53.495 GMT [28164] LOG: database system is shut down
=========================== short test summary info ============================
FAILED tests/test_testgres_common.py::TestTestgresCommon::test_pg_ctl_wait_option[remote] - Exception: Could not stop node.
dmitry-lipetsk commented
dmitry-lipetsk commented
Test must analyze postgresql.log and restart if a port conflict is detected.
dmitry-lipetsk commented
Was close via #257
