postgrespro/testgres

Unstable test - TestTestgresCommon::test_pg_ctl_wait_option[remote]

Closed this issue · 5 comments

https://app.travis-ci.com/github/postgrespro/testgres/jobs/631955014

I think, we should wait for creation of postmaster.pid and after this begin stopping our node.

______________ TestTestgresCommon.test_pg_ctl_wait_option[remote] ______________
[gw1] linux -- Python 3.12.3 /tmp/testgres_venv/bin/python
self = <testgres.tests.test_testgres_common.TestTestgresCommon object at 0x7efe130f0470>
node_svc = <testgres.tests.helpers.global_data.PostgresNodeService object at 0x7efe13e72780>
    def test_pg_ctl_wait_option(self, node_svc: PostgresNodeService):
        assert isinstance(node_svc, PostgresNodeService)
        C_MAX_ATTEMPTS = 50
    
        node = __class__.helper__get_node(node_svc)
        assert node.status() == NodeStatus.Uninitialized
        node.init()
        assert node.status() == NodeStatus.Stopped
        node.start(wait=False)
        nAttempt = 0
        while True:
            if nAttempt == C_MAX_ATTEMPTS:
                #
                # [2025-03-11]
                #  We have an unexpected problem with this test in CI
                #  Let's get an additional information about this test failure.
                #
                logging.error("Node was not stopped.")
                if not node.os_ops.path_exists(node.pg_log_file):
                    logging.warning("Node log does not exist.")
                else:
                    logging.info("Let's read node log file [{0}]".format(node.pg_log_file))
                    logFileData = node.os_ops.read(node.pg_log_file, binary=False)
                    logging.info("Node log file content:\n{0}".format(logFileData))
    
>               raise Exception("Could not stop node.")
E               Exception: Could not stop node.
C_MAX_ATTEMPTS = 50
__class__  = <class 'testgres.tests.test_testgres_common.TestTestgresCommon'>
logFileData = '2025-04-05 19:38:19.256 GMT [29667] LOG:  starting PostgreSQL 17.4 (Ubuntu 17.4-1.pgdg24.04+2) on x86_64-pc-linux-gnu... FATAL:  could not create any TCP/IP sockets\n2025-04-05 19:38:19.261 GMT [29667] LOG:  database system is shut down\n'
nAttempt   = 50
node       = PostgresNode(name='testgres-2811a483-a174-42ac-ba1d-f27e23be4387', port=57388, base_dir='/tmp/tgsn_RcX1GQ')
node_svc   = <testgres.tests.helpers.global_data.PostgresNodeService object at 0x7efe13e72780>
self       = <testgres.tests.test_testgres_common.TestTestgresCommon object at 0x7efe130f0470>
tests/test_testgres_common.py:801: Exception
------------------------------ Captured log call -------------------------------
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #1.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #2.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #3.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #4.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
---- Error:
pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?
INFO     root:test_testgres_common.py:806 Wait 1 second.
INFO     root:test_testgres_common.py:808 
INFO     root:test_testgres_common.py:810 Try to stop node. Attempt #5.
INFO     root:test_testgres_common.py:818 Node is not stopped. Exception (ExecUtilException): Utility exited with non-zero code (1). Error: `pg_ctl: PID file "/tmp/tgsn_RcX1GQ/data/postmaster.pid" does not exist
Is server running?`
Command: ('/usr/lib/postgresql/17/bin/pg_ctl -D /tmp/tgsn_RcX1GQ/data -W stop',)
Exit code: 1
.....

I understood.

https://app.travis-ci.com/github/postgrespro/testgres/jobs/632045028

  • We start node without wait
  • Node can't start because it has a conflict with another (parallel) instance.
  • We can't help him
---- Error:
pg_ctl: PID file "/tmp/tgsn_iMofAp/data/postmaster.pid" does not exist
Is server running?
ERROR    root:conftest.py:626 Node was not stopped.
INFO     root:test_testgres_common.py:822 Let's read node log file [/tmp/tgsn_iMofAp/logs/postgresql.log]
INFO     root:test_testgres_common.py:824 Node log file content:
2025-04-09 11:31:38.253 GMT [27196] LOG:  starting PostgreSQL 17.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 14.2.0) 14.2.0, 64-bit
2025-04-09 11:31:38.253 GMT [27196] LOG:  could not bind IPv4 address "127.0.0.1": Address in use
2025-04-09 11:31:38.253 GMT [27196] HINT:  Is another postmaster already running on port 54302? If not, wait a few seconds and retry.
2025-04-09 11:31:38.253 GMT [27196] WARNING:  could not create listen socket for "127.0.0.1"
2025-04-09 11:31:38.253 GMT [27196] FATAL:  could not create any TCP/IP sockets
2025-04-09 11:31:38.255 GMT [27196] LOG:  database system is shut down
=========================== short test summary info ============================
FAILED tests/test_testgres_common.py::TestTestgresCommon::test_pg_ctl_wait_option[remote] - Exception: Could not stop node.
======= 1 failed, 265 passed, 4 skipped, 1 xfailed in 638.01s (0:10:38) ========
Command exited with non-zero status 1
real	10m 38.50s
user	12m 27.05s
sys	1m 36.15s
The command "docker run $(bash <(curl -s https://codecov.io/env)) -t tests" exited with 1.

Solution

Port manager should use global (host wide) locks for reserved ports.

#245

https://app.travis-ci.com/github/postgrespro/testgres/jobs/632597068

ERROR    root:conftest.py:644 Node was not stopped.
INFO     root:test_testgres_common.py:915 Let's read node log file [/tmp/tgsn_EMLgnh/logs/postgresql.log]
INFO     root:test_testgres_common.py:917 Node log file content:
2025-05-04 20:40:53.490 GMT [28164] LOG:  starting PostgreSQL 17.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 14.2.0) 14.2.0, 64-bit
2025-05-04 20:40:53.490 GMT [28164] LOG:  could not bind IPv4 address "127.0.0.1": Address in use
2025-05-04 20:40:53.490 GMT [28164] HINT:  Is another postmaster already running on port 58514? If not, wait a few seconds and retry.
2025-05-04 20:40:53.491 GMT [28164] WARNING:  could not create listen socket for "127.0.0.1"
2025-05-04 20:40:53.491 GMT [28164] FATAL:  could not create any TCP/IP sockets
2025-05-04 20:40:53.495 GMT [28164] LOG:  database system is shut down
=========================== short test summary info ============================
FAILED tests/test_testgres_common.py::TestTestgresCommon::test_pg_ctl_wait_option[remote] - Exception: Could not stop node.

This problem on the local system (docker, py 3.8, alpine)

Image

Test must analyze postgresql.log and restart if a port conflict is detected.

Was close via #257