promoting standby does not change role

Question

promoting standby does not change role

Closed this issue 2 months ago · 7 comments

Answer 1 · 2019-05-23T15:58:03.000Z

hello,

if primary fails and rep mgr promotes standby, the "rep mgr cluster show shows role of standby as standby only but has a moment
! standby running as primary
how do I fix this issue and why wouldn't the role be changed to primary?
I have rep mgr 4.3 for postgresql 11.

Answer 2 · 2019-05-24T04:42:33.000Z

which node are you running repmgr cluster show on?
is the former primary still running?

You will see this output if a standby was promoted, but (for whatever reason) the original primary is still running and you run repmgr cluster show on the original primary, as it's not expecting to see its former standby running as primary.

If this is the case, you may be able to rejoin the former primary as a standby of the new primary by executing repmgr node rejoin, possibly using the --force-rewind option if available and appropriate.

Documentation: repmgr node rejoin

Answer 3 · 2019-05-24T05:39:40.000Z

On May 23, 2019 at 9:42 PM, Ian Barwick <notifications@github.com> wrote: which node are you running repmgr cluster show on? on the standby is the former primary still running? it gets rebooted i.e becomes unreachable and then after sometime becomes a primary. si standby s role remains standby but i get the message that it is running a primary and primary defaults to its original state without having to rejoin the cluster. thanks -kamal You will see this output if a standby was promoted, but (for whatever reason) the original primary is still running and you run repmgr cluster show on the original primary, as it's not expecting to see its former standby running as primary. If this is the case, you may be able to rejoin the former primary as a standby of the new primary by executing repmgr node rejoin, possibly using the --force-rewind option if available and appropriate. Documentation: repmgr node rejoin — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Answer 4 · 2019-05-24T07:34:17.000Z

if I remove the code for promotion & automatic failover in /etc/repmgr.conf , the standby changes status to "! running as primary" but role remains standby.
then when i try

rep mgr standby promote

it says
ERROR!: STANDBY PROMOTE can only be used on a standby node

The configuration for my setup is taken from this

https://medium.com/@victor.boissiere/how-to-setup-postgresql-cluster-with-repmgr-febc2f10c243

what exactly is triggering this status change? if I can keep the setup intact, i will be able to try line by line on the codlin what rep mgr.conf specifies.

Answer 5 · 2019-05-24T11:34:36.000Z

i found that rep mgr standby promote does change the standby to primary but as soon as the original primary comes back, the standby changes its role to standby from primary and status says it is running as primary. in my setup primary may crash often but seldom go down permanently. it may also be rebooted during a software upgrade which meanest is going to come back.

Answer 6 · 2019-05-24T12:04:05.000Z

@kamal-prasad in that case, the correct way to bring it back should be repmgr node rejoin -h current_primary as Ian already stated.

Either that, or disable repmgr daemon (repmgr daemon pause) which means if the primary does go down "permanently", you'll need to promote the standby yourself somehow (aka it won't happen automatically like it does now).

Which ever you choose depends on what fits your setup the best, there's no perfect solution for this at the moment.

I think if repmgr on the old primary will try to rejoin the cluster automatically if it sees there's another primary, instead of letting the PostgreSQL instance come up as primary even though there's a new one, that would be the solution and issues like this won't come up.
@ibarwick perhaps consider adding this feature if it isn't already planned?
Or is there some reason I didn't think of why this shouldn't be the case?

Answer 7 · 2019-05-24T19:37:48.000Z

ok -there was some confusion at my end. the error i described is visible on standby and not primary. primary does show it as 2 primaries the current infra is sufficient for me to rejoin the cluster after old primary comes back to life. but i faced an issue with postgresql not coming up after a rejoin on either node. it started fine after a reboot though. On May 24, 2019 at 5:04 AM, Benny Yarmolovich <notifications@github.com> wrote: @kamal-prasad in that case, the correct way to bring it back should be repmgr node rejoin -h current_primary as Ian already stated. Either that, or disable repmgr daemon (repmgr daemon pause) which means if the primary does go down "permanently", you'll need to promote the standby yourself somehow (aka it won't happen automatically like it does now). Which ever you choose depends on what fits your setup the best, there's no perfect solution for this at the moment. I think if repmgr on the old primary will try to rejoin the cluster automatically if it sees there's another primary, instead of letting the PostgreSQL instance come up as primary even though there's a new one, that would be the solution and issues like this won't come up. @ibarwick perhaps consider adding this feature if it isn't already planned? Or is there some reason I didn't think of why this shouldn't be the case? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.