running cluster show from standby or witness hangs when ip down on primary

Question

running cluster show from standby or witness hangs when ip down on primary

Opened this issue 4 years ago · 1 comments

This is more of a question becouse i don't know if i have missed some important parameter setting in repmgr.conf

we run a script on primary node to fake network down
like this
#!/bin/sh
ip link set eno1 down
sleep 60
ip link set eno1 up

during this time it is no problem running command (below) from standby or witness node, it timeout correctly
ssh -q -o StrictHostKeyChecking=no -o ConnectTimeout=1

But when i run "cluster show" from standby or witness it just "hangs" until IP is up again on primary
it's nothing in the log at standby or witness until ip back again
repmgr -f repmgr.conf cluster show

in repmgr.conf (tried different settings with no success on all the nodes involved )
ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=1'
#ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=10'

What am i doing wrong? I though that repmgr immediately would detect the problem and take action

Answer 1 · 2020-11-30T06:12:36.000Z

repmgr cluster show attempts to make database connections to the other node(s), and doesn't use SSH.

Do you have connect_timeout set in your conninfo strings? If this is not present, PostgreSQL's connection will wait until the network connection times out before returning failure, which can be quite a long time depending on your environment.

It might also be worth looking at your network settings, e.g. net.ipv4.tcp_syn_retries.