Why does `node rejoin` tries to connect to the wrong upstream?

Question

Why does `node rejoin` tries to connect to the wrong upstream?

Closed this issue 2 years ago · 1 comments

postgres@sa293a:~ $ repmgr --version
repmgr 5.3.3

With a two-node cluster running I force the failover by shutting down the primary. Failover succeeds, after which I have the following state:

postgres@sa293b:~ $ repmgr cluster show
 ID | Name   | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+--------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------
 1  | sa293a | primary | - failed  | ?        | default  | 100      |          | host=sa293a port=5432 dbname=repmgr user=repmgr  application_name=sa293a connect_timeout=2
 2  | sa293b | primary | * running |          | default  | 100      | 10       | host=sa293b port=5432 dbname=repmgr user=repmgr  application_name=sa293b connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "sa293a" (ID: 1)

which is what I expect. The node state in the repmgr database looks right too:

repmgr=# SELECT node_id
repmgr-#          FROM repmgr.nodes
repmgr-#         WHERE type = 'primary'
repmgr-#           AND active IS TRUE ;
 node_id
---------
       2
(1 row)

repmgr=# SELECT * FROM repmgr.nodes n  WHERE n.node_id = 2;
 node_id | upstream_node_id | active | node_name |  type   | location | priority |                                          conninfo                                          | repluser |   sl
ot_name   |        config_file
---------+------------------+--------+-----------+---------+----------+----------+--------------------------------------------------------------------------------------------+----------+-----
----------+----------------------------
       2 |                  | t      | sa293b    | primary | default  |      100 | host=sa293b port=5432 dbname=repmgr user=repmgr  application_name=sa293b connect_timeout=2 | repmgr   | repm
gr_slot_2 | /etc/repmgr/14/repmgr.conf
(1 row)

I then ask the former primary (sa293a) to join the cluster via the new primary (sa293b) -- note that I have an Ansible task for that, so the command is run via Ansible:

    "cmd": "/usr/pgsql-14/bin/repmgr node rejoin -f /etc/repmgr/14/repmgr.conf -h sa293b -d repmgr --force-rewind\n",

However, this fails, because it seems to try to connect to the former primary instead:

    "stderr_lines": [
        "WARNING: database is not shut down cleanly",
        "DETAIL: --force-rewind provided, pg_rewind will automatically perform recovery",
        "NOTICE: rejoin target is node \"sa293a\" (ID: 1)",
        "ERROR: connection to database failed",
        "DETAIL: ",
        "connection to server at \"sa293a\" (172.17.0.2), port 5432 failed: Connection refused",
        "\tIs the server running on that host and accepting TCP/IP connections?",
        "",
        "DETAIL: attempted to connect using:",
        "  user=repmgr connect_timeout=2 dbname=repmgr host=sa293a port=5432 application_name=sa293a fallback_application_name=repmgr options=-csearch_path=",
        "ERROR: unable to connect to current registered primary \"sa293a\" (ID: 1)",
        "DETAIL: registered primary node conninfo is: \"host=sa293a port=5432 dbname=repmgr user=repmgr  application_name=sa293a connect_timeout=2\""
    ],

Why would it use sa293a as the rejoin target, if sa293b is specified on the command line, and also sa293b shows as primary in repmgr.nodes?

Answer 1 · 2022-11-24T17:54:54.000Z

Looks like it's a timing issue: the playbook tries to rejoin the failed node too soon after the failover, before the table repmgr.nodes is updated with the new state of the cluster.