EnterpriseDB/repmgr

repmgr keeps losing connection with local (127.0.0.1) postgresql server after hours

Opened this issue · 1 comments

pg version = 15.3 repmgr version = 5.4dev

After setting up
event_notification_command='/etc/repmgr/on_event.sh "%t" %e %n "%d" %s'
in repmgr.conf
where as on_event.sh =
echo $1: $2 on node$3, $4 [[ $5 -eq 1 ]] && echo - happened || echo - incomplete >> /etc/repmgr/events.log

events.log has captured disconnection now and then :

2023-09-26 08:44:05+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 08:44:07.169209+08: repmgrd_local_reconnect on node2, reconnected to local node after 2 seconds - happened
2023-09-26 11:37:08+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 11:37:35.068834+08: repmgrd_local_reconnect on node2, reconnected to local node after 26 seconds - happened
2023-09-26 13:35:55+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 13:36:06.068954+08: repmgrd_local_reconnect on node2, reconnected to local node after 10 seconds - happened
2023-09-26 14:56:56+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 14:57:06.074974+08: repmgrd_local_reconnect on node2, reconnected to local node after 9 seconds - happened
2023-09-26 17:40:02+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 17:40:02.08399+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened
2023-09-26 23:26:04+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 23:26:44.079573+08: repmgrd_local_reconnect on node2, reconnected to local node after 39 seconds - happened
2023-09-26 23:26:04+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-26 23:26:44.079573+08: repmgrd_local_reconnect on node2, reconnected to local node after 39 seconds - happened
2023-09-27 03:02:53+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-27 03:02:56.09054+08: repmgrd_local_reconnect on node2, reconnected to local node after 2 seconds - happened
2023-09-27 07:56:41+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-27 07:56:41.093777+08: repmgrd_local_reconnect on node2, reconnected to local node after 0 seconds - happened
2023-09-27 12:50:50+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-27 12:51:05.061954+08: repmgrd_local_reconnect on node2, reconnected to local node after 15 seconds - happened
2023-09-27 15:49:01+0800: repmgrd_local_disconnect on node2, unable to connect to local node - happened
2023-09-27 15:49:06.954451+08: repmgrd_local_reconnect on node2, reconnected to local node after 5 seconds - happened

So it will encounter a disconnection event every 3-5 hours. And the re-connection is not always immediate - it happened sometimes after 15 seconds, whereas I did not do anything to repmgr or postgresql server.

I have set up "ping-pong" mechanism in postgresql.conf to avoid silent TCP connection cut off by routers - remote connection from a dbeaver client to postgresql server would not encounter a problem even after days:

tcp_keepalives_idle = 20
tcp_keepalives_interval = 10
tcp_keepalives_count = 3