EnterpriseDB/repmgr

failover not triggerd on unreachable state

Closed this issue · 2 comments

Hey,

I have 3 cluster nodes deployed on VMs. There are some networks issues, which cause the primary to be not available to other nodes in cluster.

When we run cluster show command both other servers show primary as unreachable. But repmgr doesn't trigger failover. Further more, repmgr show no logs of monitoring the primary (also upstream node).

Repmgr work well in other situations, such as postgres service crush or server crush.

We have configured repmgr like that:

failover=automatic
reconnection_attempts=4
reconnect_interval=5

This seems to me like the relvant configuration for this problem.

Do you have any idea why repmgr doesnt trigger failover in situation like that? And doesnt write any logs either?
Is unreachable state not triggers failover? what cause the server enter "unreachable" state and not "failed"? (server is isaccessable)

Can you confirm that the repmgrd daemon is running on all nodes? The logs from at least one node should clearly show if/when disconnections occurred, if it actually disrupted the Postgres connections repmgrd makes to each node.

What does this command show:

repmgr service status

If it is running on all nodes, how are you checking the logs? The repmgr daemon is very chatty even under normal operating circumstances.

Actually, I take this back. Please do not open duplicate tickets. Closing as dupe of #722.