EnterpriseDB/repmgr

Following the documentation but ended up with a different result follow-new-primary

Opened this issue · 0 comments

Hello,

Just trying to learn about repmgr and was at /promoting-standby.html but got error on follow-new-primary.xml

Version: repmgr 5.3.2

OS

postgres@bd3:~$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

configuration file:
Note: id and name change following node.

node_id=1
node_name='bd1'
conninfo='host=bd1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/14/main'
pg_basebackup_options=''
ssh_options='-q -o ConnectTimeout=10'
service_start_command='sudo pg_ctlcluster 14 main start'
service_stop_command='sudo pg_ctlcluster 14 main stop'
service_restart_command= 'sudo pg_ctlcluster 14 main restart'
service_reload_command='sudo pg_ctlcluster 14 main reload'

From https://repmgr.org/docs/current/promoting-standby.html => OK

postgres@bd1:~$ repmgr cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                   
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 1  | bd1  | primary | * running |          | default  | 100      | 1        | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
 2  | bd2  | standby |   running | bd1      | default  | 100      | 1        | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
 3  | bd3  | standby |   running | bd1      | default  | 100      | 1        | host=bd3 user=repmgr dbname=repmgr connect_timeout=2

On primary:

pg_ctlcluster 14 main stop

On first standby bd2:

postgres@bd2:~$ repmgr standby promote
WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
  bd3 (node ID: 3)
NOTICE: promoting standby to primary
DETAIL: promoting server "bd2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "bd2" (ID: 2) was successfully promoted to primary

Then status are:

postgres@bd2:~$ repmgr cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                   
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 1  | bd1  | primary | - failed  | ?        | default  | 100      |          | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
 2  | bd2  | primary | * running |          | default  | 100      | 2        | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
 3  | bd3  | standby |   running | ? bd1    | default  | 100      | 1        | host=bd3 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "bd1" (ID: 1)
  - unable to connect to node "bd3" (ID: 3)'s upstream node "bd1" (ID: 1)
  - unable to determine if node "bd3" (ID: 3) is attached to its upstream node "bd1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

On second standby bd3 needs to follow bd2:

postgres@bd3:~$ repmgr standby follow
NOTICE: attempting to find and follow current primary
INFO: local node 3 can attach to follow target node 2
DETAIL: local node's recovery point: 0/40000A0; follow target node's fork point: 0/40000A0
NOTICE: setting node 3's upstream to node 2
WARNING: node "bd3" not found in "pg_stat_replication"
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "bd2" (ID: 2)

The documentation does not mention anything about the replication slot missing: https://github.com/EnterpriseDB/repmgr/blob/master/doc/follow-new-primary.xml#L18

Then status are:

postgres@bd2:~$ repmgr cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                   
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
 1  | bd1  | primary | - failed  | ?        | default  | 100      |          | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
 2  | bd2  | primary | * running |          | default  | 100      | 2        | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
 3  | bd3  | standby |   running | bd2      | default  | 100      | 1        | host=bd3 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - unable to connect to node "bd1" (ID: 1)

HINT: execute with --verbose option to see connection error messages

Do you have an idea about what needs to be done to fixe the issue ?

I created the bd3 replication slot but it did nothing.

Robin,