rejoin hangs

Question

rejoin hangs

Opened this issue 3 years ago · 0 comments

Hi,
i have a problem with replication manager 5.2.1 and Postgres 12.7. If during a failover the old master or a standby is down for some minutes, a standby rejoin on these nodes will hang (without timeout) because Postgres won't start.
In the process list one can see that postgres startup is waiting for some WAL file, but this file doesn't exist on any node nor was it archived before.
In earlier versions repmgr generated an error because of a missing WAL segment, and that normally was already archived.
Though i found out that i can fix it by raising wal_keep_segments, i wonder about the non-existent file in the error message, and why the behavior has changed. Furthermore the repmgr documentation states that normally there is no need to change this parameter.
Thanks for any hints.