EnterpriseDB/repmgr

repmgr switchover fails with

Opened this issue · 0 comments

Setup:
` ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string

----+------+---------+----------------------+----------+----------+----------+----------+

1 | nvme | primary | * running | | default | 100 | 5 | host=x.1 user=repmgr dbname=repmgr connect_timeout=2

2 | hdd | standby | running | 1 | default | 100 | 6 | host=x.2 user=repmgr dbname=repmgr connect_timeout=2

`

Steps to reproduce:

  1. Turn off Node 1 (nvme).
  2. Promote Node 2 (hdd): run on Node 2: repmgr primary unregister --node-id 1
  3. Wipe Postgresql folder on Node 1 (this will happen due to the hardware setup).
  4. Clone from Node 2 to Node 1: running on Node 1:
    repmgr -h x.2 -U repmgr -d repmgr --fast-checkpoint standby clone
  5. Switch the primary from Node 2 to Node 1: running on Node 1:
    repmgr standby switchover

The last command fails with timeout of "waiting for received WAL to flush to disk"
Running the switchover command with "--always-promote" does not help, and leaves the cluster in a broken state with both Node 1 and Node 2 set as primary.

Any idea what's the cause of the error? Or how can a switchover be achieved?