EnterpriseDB/repmgr

promoting standby does not change role

Closed this issue · 7 comments

promoting standby does not change role

hello,

if primary fails and rep mgr promotes standby, the "rep mgr cluster show shows role of standby as standby only but has a moment
! standby running as primary
how do I fix this issue and why wouldn't the role be changed to primary?
I have rep mgr 4.3 for postgresql 11.

  • which node are you running repmgr cluster show on?
  • is the former primary still running?

You will see this output if a standby was promoted, but (for whatever reason) the original primary is still running and you run repmgr cluster show on the original primary, as it's not expecting to see its former standby running as primary.

If this is the case, you may be able to rejoin the former primary as a standby of the new primary by executing repmgr node rejoin, possibly using the --force-rewind option if available and appropriate.

Documentation: repmgr node rejoin

if I remove the code for promotion & automatic failover in /etc/repmgr.conf , the standby changes status to "! running as primary" but role remains standby.
then when i try

rep mgr standby promote

it says
ERROR!: STANDBY PROMOTE can only be used on a standby node

The configuration for my setup is taken from this

https://medium.com/@victor.boissiere/how-to-setup-postgresql-cluster-with-repmgr-febc2f10c243

what exactly is triggering this status change? if I can keep the setup intact, i will be able to try line by line on the codlin what rep mgr.conf specifies.

i found that rep mgr standby promote does change the standby to primary but as soon as the original primary comes back, the standby changes its role to standby from primary and status says it is running as primary. in my setup primary may crash often but seldom go down permanently. it may also be rebooted during a software upgrade which meanest is going to come back.

@kamal-prasad in that case, the correct way to bring it back should be repmgr node rejoin -h current_primary as Ian already stated.

Either that, or disable repmgr daemon (repmgr daemon pause) which means if the primary does go down "permanently", you'll need to promote the standby yourself somehow (aka it won't happen automatically like it does now).

Which ever you choose depends on what fits your setup the best, there's no perfect solution for this at the moment.

I think if repmgr on the old primary will try to rejoin the cluster automatically if it sees there's another primary, instead of letting the PostgreSQL instance come up as primary even though there's a new one, that would be the solution and issues like this won't come up.
@ibarwick perhaps consider adding this feature if it isn't already planned?
Or is there some reason I didn't think of why this shouldn't be the case?