Split brain produced by stopping postgres or shutting down VM

Question

Split brain produced by stopping postgres or shutting down VM

Opened this issue 2 years ago · 2 comments

robinportigliatti commented 2 years ago

Hello,

repmgrd does not seem to handle failover correctly. Resulting in split brain.

All config files are at the bottom.

bd1 => primary
bd2 => standby
bd3 => witness

1 - Two `start postgresql` after an utomatic failover

Here is the first scenario:

systemctl stop postgresql
# waiting for bd2 to become primary autmatically
# bd2 is primary
systemctl start postgresql

After some time bd1 is stopped automatically with repmgrd, since there is now two primaries (bd1 and bd2).

But if I try to start again postgresql service, bd1 is not stopped and I am in a split brain scenario.

2 - VM is down

bd1's VM is shut down.

bd2 is now primary after an automatic failover.

bd1's VM is up.

After some time bd1 is NOT stopped automatically with repmgrd, it should since there is now two primaries (bd1 and bd2).

Conclusion

In both scenarios postgresql service should be automatically stopped to reduce split brain scenarios.

In all scenarios, cluster show show something different.

bd1:

ID | Name | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                                   
----+------+---------+----------------------+----------+----------+----------+----------+------------------------------------------------------
1  | bd1  | primary | * running            |          | bd3      | 100      | 1        | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
2  | bd2  | standby | ! running as primary |          | bd3      | 100      | 2        | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
3  | bd3  | witness | * running            | ! bd2    | bd3      | 0        | n/a      | host=bd3 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
 - node "bd2" (ID: 2) is registered as standby but running as primary
 - node "bd3" (ID: 3) reports a different upstream (reported: "bd2", expected "bd1")

bd2, bd3:

ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                   
----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------
1  | bd1  | primary | ! running |          | bd3      | 100      | 1        | host=bd1 user=repmgr dbname=repmgr connect_timeout=2
2  | bd2  | primary | * running |          | bd3      | 100      | 2        | host=bd2 user=repmgr dbname=repmgr connect_timeout=2
3  | bd3  | witness | * running | bd2      | bd3      | 0        | n/a      | host=bd3 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
 - node "bd1" (ID: 1) is running but the repmgr node record is inactive

Config files:

repmgr

`/etc/default/repmgrd`:

# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd

# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=yes

# configuration file (required)
REPMGRD_CONF="/etc/repmgr.conf"

# additional options
#REPMGRD_OPTS=""

# user to run repmgrd as
REPMGRD_USER=postgres

# repmgrd binary
REPMGRD_BIN=/usr/bin/repmgrd

# pid file
REPMGRD_PIDFILE=/var/run/repmgrd.pid

`/etc/repmgr.conf`

Note: names and connections changes depengind on what node you are.

node_id=1
node_name='bd1'
conninfo='host=bd1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/14/main'
pg_basebackup_options=''
ssh_options='-q -o ConnectTimeout=10'
pg_bindir='/usr/lib/postgresql/14/bin/'
service_start_command='sudo /bin/systemctl start postgresql@14-main.service'
service_stop_command='sudo /bin/systemctl stop postgresql@14-main.service'
service_restart_command= 'sudo /bin/systemctl restart postgresql@14-main.service'
service_reload_command='sudo /bin/systemctl reload postgresql@14-main.service'
location='bd3'
primary_visibility_consensus=true
monitoring_history=true
failover=automatic
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
log_file='/var/log/repmgr/repmgr.log'
log_facility='STDERR'
#event_notifications='repmgrd_failover_promote,repmgrd_standby_reconnect,standby_switchover'
#event_notification_command=/scripts/example/repmgr_events.sh %e %n'
repmgrd_service_start_command='sudo /usr/bin/systemctl start repmgrd'
repmgrd_service_stop_command='sudo /usr/bin/systemctl stop repmgrd'
child_nodes_connected_min_count=1
child_nodes_connected_include_witness='true'
child_nodes_disconnect_command='sudo /bin/systemctl stop postgresql@14-main.service'
async_query_timeout=60
reconnect_attempts=2
reconnect_interval=10

System

`/etc/os-release`

NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

`/etc/sudoers`

[...]
postgres ALL = NOPASSWD: /bin/systemctl start repmgrd, /bin/systemctl stop repmgrd, /bin/systemctl stop postgresql@14-main.service, /bin/systemctl start postgresql@14-main.service, /bin/systemctl restart postgresql@14-main.service, /bin/systemctl reload postgresql@14-main.service

Postgres

`postgresql.conf`

max_replication_slots = 10
wal_level = 'hot_standby'
hot_standby = on
archive_mode = on
archive_command = 'scp %p bd3:/tmp/%f' # ugly, just for testing pg_rewind with rejoin
listen_addresses = '*'
wal_log_hints = on
restore_command = 'scp bd3:/tmp/%f %p'  # ugly, just for testing pg_rewind with rejoin

`pg_hba.conf`

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                trust
#host   replication                             postgres                                                                        127.0.0.1/32            trust
#host   replication                             postgres                                                                        ::1/128                 trust

local replication all  peer
host replication all 127.0.0.1/32 scram-sha-256
host replication all 127.0.2.1/32 scram-sha-256
host replication all ::1/128 scram-sha-256
host replication repmgr 192.168.60.0/24 scram-sha-256
local all all  peer
host all all 127.0.0.1/32 scram-sha-256
host all all 127.0.2.1/32 scram-sha-256
host all all 192.168.60.0/24 scram-sha-256

`.pgpass`

pgpass file mode is 600. And is set correctly.

Best regards;

Answer 1 · 2022-09-09T14:31:18.000Z

Note that tweaking boot order of both repmgr and postgresql services does nothing.

Answer 2 · 2022-09-15T15:58:27.000Z

I have the same issue when powering off the primary, child_nodes_disconnect_command is not executed, even if the now old primary has 0 child node and should be fenced.

repmgrd gives the following output :

[2022-09-15 17:54:03] [NOTICE] 0 (of 1) child nodes are connected, but at least 1 child nodes required
[2022-09-15 17:54:03] [INFO] no child nodes have detached since repmgrd startup
[2022-09-15 17:54:09] [NOTICE] 0 (of 1) child nodes are connected, but at least 1 child nodes required
[2022-09-15 17:54:09] [INFO] no child nodes have detached since repmgrd startup
[2022-09-15 17:54:15] [NOTICE] 0 (of 1) child nodes are connected, but at least 1 child nodes required
[2022-09-15 17:54:15] [INFO] no child nodes have detached since repmgrd startup
[2022-09-15 17:54:21] [NOTICE] 0 (of 1) child nodes are connected, but at least 1 child nodes required
[2022-09-15 17:54:21] [INFO] no child nodes have detached since repmgrd startup
[2022-09-15 17:54:27] [NOTICE] 0 (of 1) child nodes are connected, but at least 1 child nodes required
[2022-09-15 17:54:27] [INFO] no child nodes have detached since repmgrd startup

1 - Two start postgresql after an utomatic failover

2 - VM is down

Conclusion

Config files:

repmgr

/etc/default/repmgrd:

/etc/repmgr.conf

System

/etc/os-release

/etc/sudoers