Getting error when rejoining a standby to replication cluster

Question

Getting error when rejoining a standby to replication cluster

Closed this issue 3 years ago · 1 comments

So I am testing with simple 2 nodes cluster. One being PG-Master and other being PG-replica.

PostgreSQL : 14.2
Ubuntu : 20.04
repmgr : 5.3.1

So I setup replication successfully between two servers. After that, I shut down two servers and start both of them back up again. Now cluster show commands gives below warning.

WARNING: node "PG-replica" not found in "pg_stat_replication"

So I am trying to rejoin PG-replica by running below command

repmgr -f /etc/repmgr/repmgr.conf node rejoin -d 'host=pg-master-ip user=repmgr dbname=repmgr connect_timeout=2'

I am getting below output

NOTICE: rejoin target is node "PG-Master" (ID: 1)
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 0/D001310, rejoin target lsn is 0/E0008A0
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=pg-replica-ip user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "pg_ctl  -w -D '/var/lib/postgresql/14/main' start"
ERROR: unable to start server
ERROR: NODE REJOIN failed

I feel that repmgr is trying to start postgresql using pg_ctl which is not available in debian package installation of postgresql. instead wrapper pg_ctlcluster is used.

Am I doing something wrong here? Please help.

Answer 1 · 2022-02-28T17:52:03.000Z

This is not an issue. service_(start|stop|reload|status)_command setting can be used in repmgr.conf to set the custom command for PostgreSQL cluster start/stop. In case of debian, you can use pg_ctlcluster instead of pg_ctl.