EnterpriseDB/repmgr

repmgr version 5.2.0: PostgreSQL stop causes repmgr to stop

Closed this issue · 5 comments

Good time of the day, everyone.

When the "standby_disconnect_on_failover = true" parameter is set, stopping the PostgreSQL service causes the repmgrd service to stop.
After starting the Postgresql service with the "systemctl start postgresql" command, I try to manually restart the repmgrd service.
But unsuccessfully. Repmgrd does not start. And only after registering Standby to the cluster, I was able to restart the repmgrd service.
I think this is a bug, because in version 5.1.0, repmgrd did not stop after stopping Postgresql.
I attach a file with a sequence of steps to emulate this problem.
postgresql_zabservtstpve_conf.txt
repmgr_zabproxynode1_conf.txt
repmgr_zabservtstnode1_conf.txt
repmgr_zabservtstpve_conf.txt
my_systems.txt
postgresql_zabproxynode1_conf.txt
postgresql_zabservtstnode1_conf.txt
PostgreSQL_stop_causes_repmgr_to_stop.txt

Thanks for the detailed report, much appreciated.

On investigation it turns out there was a corner-case bug where repmgrd would terminate if:

  • standby_disconnect_on_failover was set to true
  • the local (standby) node was not running when the primary node was shut down

This issue is also present in repmgr 5.1 and probably all versions since standby_disconnect_on_failover was added. Commit 8f7a32a fixes this.

Hi ibarwick.
I did an apt upgrade yesterday on cluster operating systems. I rebooted the systems and tested the absence of this error on my cluster. The error remains. The files are attached.

Maybe I updated repmgr wrong?
repmgr_Master.log
repmgr_Standby.log
Listing_from_Master_08122020.txt

Now I will do another test and report the result

If you upgraded packages yesterday, you presumably still have 5.2.0; we have only just published 5.2.1 packages (currently available from 2ndQuadrant repositories, not PGDG) which contain the fix.

Yes exactly. Packages 5.2.1 appeared today.
postgresql-12-repmgr / buster-2ndquadrant 5.2.1-1.buster + 1 amd64 [can be upgraded from: 5.2.0-2.pgdg100 + 1]
repmgr-common / buster-2ndquadrant 5.2.1-1.buster + 1 all [can be upgraded from: 5.2.0-2.pgdg100 + 1]
Ibarwick, can you tell me how to install it? As a minor version or as a major version?

Good time of the day, Ibarwick.
I installed repmgr version 5.2.1 today. And I made two tests:
Test number 1:
I stopped postgresql service and immediately disabled ethernet link (eth2). This behavior (forced disconnection of the eth interface in case of postgresql failure) in my script for fencing the failing Master, and turning it into Standby.
Test result is OK !!!!!!

Test number 2
In this test, I stopped only the Postgresql service (I did not disable the ethernet interface).
Test result is OK !!!!!!

I am attaching a test report and logs of the repmgr (maybe it will be useful to someone).

Note: I noticed that the logs on the Standby in test 1 and test 2 of the Slave are different.

I close the problem as solved. Many thanks to you and your team for fixing this error. Her solution will make it a bit easier for my script to fence the faulty Master and turn him into a Slave.
repmgr_Standby_09122020_stop_postgresql.log
repmgr_Standby_09122020_stop_postgresql_and_ifdown_eth.log
repmgrp_Master_09122020_stop_postgresql.log
Listing_from_Standby.txt
repmgr_Master_09122020_stop_postgresql_and_ifdown_eth.log