EnterpriseDB/repmgr

"node check" incorrectly reports node is not attached to upstream while in state "catchup"

Opened this issue · 0 comments

repmgr 5.3.0, PostgreSQL 14.1, Ubuntu 18.04

I have one standby node attached to a primary. The standby fell behind during some heavy writing on the primary and I rebooted the standby. After the reboot, the standby took some time to re-appear in pg_stat_replication, but eventually did re-appear there with state="catchup". During this time repmgr node check incorrectly reports that it is not attached to the upstream node at all:

# sudo -u postgres repmgr -f "/etc/postgresql/14/behavior/repmgr.conf" node check
WARNING: node "(MY_STANDBY_NODE)" attached in state "catchup"
Node "(MY_STANDBY_NODE)":
	Server role: OK (node is standby)
	Replication lag: CRITICAL (2095 seconds, critical threshold: 600))
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: CRITICAL (node "(MY_STANDBY_NODE)" (ID: 2) is not attached to expected upstream node "(MY_PRIMARY_NODE)" (ID: 1))
...

cluster show shows correct warnings, however:

# sudo -u postgres repmgr -f "/etc/postgresql/14/behavior/repmgr.conf" cluster show
WARNING: node "(MY_STANDBY_NODE)" attached in state "catchup"
...
WARNING: following issues were detected
  - node "(MY_STANDBY_NODE)" (ID: 2) attached to its upstream node "(MY_PRIMARY_NODE)" (ID: 1) in state "catchup"

I rebooted the standby twice and this happened both times. (The warning eventually disappears when pg_stat_replication.state changes from "catchup" to "streaming".)