repmgr switchover fails with

Question

repmgr switchover fails with

Opened this issue 3 years ago · 0 comments

tkorach commented 3 years ago

----+------+---------+----------------------+----------+----------+----------+----------+

2 | hdd | standby | running | 1 | default | 100 | 6 | host=x.2 user=repmgr dbname=repmgr connect_timeout=2

`

Steps to reproduce:

Turn off Node 1 (nvme).
Promote Node 2 (hdd): run on Node 2: repmgr primary unregister --node-id 1
Wipe Postgresql folder on Node 1 (this will happen due to the hardware setup).
Clone from Node 2 to Node 1: running on Node 1:
repmgr -h x.2 -U repmgr -d repmgr --fast-checkpoint standby clone
Switch the primary from Node 2 to Node 1: running on Node 1:
repmgr standby switchover

The last command fails with timeout of "waiting for received WAL to flush to disk"
Running the switchover command with "--always-promote" does not help, and leaves the cluster in a broken state with both Node 1 and Node 2 set as primary.

Any idea what's the cause of the error? Or how can a switchover be achieved?