"node check" incorrectly reports node is not attached to upstream while in state "catchup"

Question

"node check" incorrectly reports node is not attached to upstream while in state "catchup"

Opened this issue 2 years ago · 0 comments

repmgr 5.3.0, PostgreSQL 14.1, Ubuntu 18.04

I have one standby node attached to a primary. The standby fell behind during some heavy writing on the primary and I rebooted the standby. After the reboot, the standby took some time to re-appear in pg_stat_replication, but eventually did re-appear there with state="catchup". During this time repmgr node check incorrectly reports that it is not attached to the upstream node at all:

# sudo -u postgres repmgr -f "/etc/postgresql/14/behavior/repmgr.conf" node check
WARNING: node "(MY_STANDBY_NODE)" attached in state "catchup"
Node "(MY_STANDBY_NODE)":
	Server role: OK (node is standby)
	Replication lag: CRITICAL (2095 seconds, critical threshold: 600))
	WAL archiving: OK (0 pending archive ready files)
	Upstream connection: CRITICAL (node "(MY_STANDBY_NODE)" (ID: 2) is not attached to expected upstream node "(MY_PRIMARY_NODE)" (ID: 1))
...

cluster show shows correct warnings, however:

# sudo -u postgres repmgr -f "/etc/postgresql/14/behavior/repmgr.conf" cluster show
WARNING: node "(MY_STANDBY_NODE)" attached in state "catchup"
...
WARNING: following issues were detected
  - node "(MY_STANDBY_NODE)" (ID: 2) attached to its upstream node "(MY_PRIMARY_NODE)" (ID: 1) in state "catchup"

I rebooted the standby twice and this happened both times. (The warning eventually disappears when pg_stat_replication.state changes from "catchup" to "streaming".)