Automatic failover not working when terminating the process of postgres server on primary

Question

Automatic failover not working when terminating the process of postgres server on primary

Opened this issue 4 years ago · 6 comments

[2020-12-01 04:17:45] [WARNING] unable to ping "host=sbx2 user=repmgr dbname=repmgr connect_timeout=2"
[2020-12-01 04:17:45] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-12-01 04:17:45] [WARNING] unable to ping "host=sbx2 user=repmgr dbname=repmgr connect_timeout=2"
[2020-12-01 04:17:45] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-12-01 04:17:47] [WARNING] unable to ping "host=sbx2 user=repmgr dbname=repmgr connect_timeout=2"
[2020-12-01 04:17:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-12-01 04:17:47] [WARNING] unable to ping "host=sbx2 user=repmgr dbname=repmgr connect_timeout=2"
[2020-12-01 04:17:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

WARNING: following issues were detected

when attempting to connect to node "sbx2" (ID: 1), following error encountered :
"could not connect to server: Connection refused
Is the server running on host "sbx2" (10.201.228.77) and accepting
TCP/IP connections on port 5432?"
node "sbx2" (ID: 1) is registered as an active primary but is unreachable
unable to connect to node "sbx1" (ID: 2)'s upstream node "sbx2" (ID: 1)
unable to determine if node "sbx1" (ID: 2) is attached to its upstream node "sbx2" (ID: 1)
unable to connect to node "stt1" (ID: 4)'s upstream node "sbx2" (ID: 1)

FOLLLOWING IS THE CONFIGURATION I HAVE USED IN THE BOTH / PRIMARY AND WITNESS.
ONLY DIFFERRENE IN CONFIGRATION ON WITNESS SERVER IS THAT THE DATA DIR OF WITNESS IS DIFFERENT

node_id=2
node_name='sbx1'
conninfo='host=sbx1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/data'

log_level='NOTICE'
log_file='/home/postgres/repmgr/repdebug.log'

pg_bindir='/usr/pgsql-12/bin/'

failover='automatic'
reconnect_attempts=4
reconnect_interval=5
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'

repmgrd_pid_file='/run/repmgr/repmgrd-12.pid'
service_start_command = 'sudo systemctl start postgresql-12.service'
service_stop_command = 'sudo systemctl stop postgresql-12.service'
service_restart_command = 'sudo systemctl restart postgresql-12.service'
repmgrd_service_start_command = 'sudo systemctl start repmgr12'
repmgrd_service_stop_command = 'sudo systemctl stop repmgr12'

#####################################################################

node_id=1
node_name='sbx2'
conninfo='host=sbx2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/data'

node_id=4
node_name='stt1'
conninfo='host=stt1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/12/data'

-bash-4.2$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster crosscheck
Name | Id | 1 | 2 | 4
------------+----+----+----+----
sbx2 | 1 | * | * | *
sbx1 | 2 | * | * | *
stt1 | 4 | * | * | *
-bash-4.2$

########################################

-bash-4.2$ sudo systemctl status repmgr12
● repmgr12.service - A replication manager, and failover management tool for PostgreSQL
Loaded: loaded (/usr/lib/systemd/system/repmgr12.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2020-12-01 03:47:12 CST; 1h 23min ago
Process: 131439 ExecStop=/usr/bin/kill -TERM $MAINPID (code=exited, status=0/SUCCESS)
Process: 131459 ExecStart=/usr/pgsql-12/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose (code=exited, status=0/SUCCESS)
Main PID: 131462 (repmgrd)
CGroup: /system.slice/repmgr12.service
└─131462 /usr/pgsql-12/bin/repmgrd -f /etc/repmgr/12/repmgr.conf -p /run/repmgr/repmgrd-12.pid -d --verbose

Answer 1 · 2020-12-01T11:31:17.000Z

I am also able to perform swtich over ( role change and follow ). Below command is working fine and roles are changing after its execution.

/usr/pgsql-12/bin/repmgr standby switchover -f /etc/repmgr/12/repmgr.conf --siblings-follow --force-rewind

but automatic failover is not happening. There are no error reported in repmgrd log. All it says is

[WARNING] unable to ping "host=sbx2 user=repmgr dbname=repmgr connect_timeout=2"
[DETAIL] PQping() returned "PQPING_NO_RESPONSE""

when primary is down.

Do I need to put in more configuration for automatic thing to work. Am i missing something.

Thanks and Regards
Khanna.43

Answer 2 · 2020-12-02T06:05:06.000Z

If these are the full contents of the repmgr.conf files:

node_id=1
node_name='sbx2'
conninfo='host=sbx2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/home/postgres/data'

node_id=4
node_name='stt1'
conninfo='host=stt1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/12/data'

you'll need to add failover='automatic' to each (as the default is manual), and reload the repmgrd configuration. See the documentation for further details: https://repmgr.org/docs/current/repmgrd-basic-configuration.html#REPMGRD-AUTOMATIC-FAILOVER-CONFIGURATION

Answer 3 · 2020-12-28T11:27:54.000Z

Faced same issue with repmgr-5.2.1 and postgresql-12.5
Nodes with ID 1,2,5 was shutted down gracefully, first primary node then others

cluster show:

 ID | Name | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                    
----+------+---------+---------------+----------+----------+----------+----------+-------------------------------------------------------
 1  | h188 | standby | ? unreachable | ? h189   | default  | 100      |          | host=h188 user=repmgr dbname=repmgr connect_timeout=2
 2  | h189 | primary | ? unreachable | ?        | default  | 100      |          | host=h189 user=repmgr dbname=repmgr connect_timeout=2
 4  | h181 | standby |   running     | ? h189   | default  | 100      | 3        | host=h181 user=repmgr dbname=repmgr connect_timeout=2
 5  | h182 | standby | ? unreachable | ? h189   | default  | 100      |          | host=h182 user=repmgr dbname=repmgr connect_timeout=2
 6  | h190 | witness | * running     | ? h189   | default  | 0        | n/a      | host=h190 user=repmgr dbname=repmgr connect_timeout=2

daemon status:

 ID | Name | Role    | Status        | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+------+---------+---------------+----------+---------+-------+---------+--------------------
 1  | h188 | standby | ? unreachable | ? h189   | n/a     | n/a   | n/a     | n/a                
 2  | h189 | primary | ? unreachable | ?        | n/a     | n/a   | n/a     | n/a                
 4  | h181 | standby |   running     | ? h189   | running | 29838 | no      | 841 second(s) ago  
 5  | h182 | standby | ? unreachable | ? h189   | n/a     | n/a   | n/a     | n/a                
 6  | h190 | witness | * running     | ? h189   | running | 29162 | no      | 846 second(s) ago  

WARNING: following issues were detected
  - unable to  connect to node "h188" (ID: 1)
  - node "h188" (ID: 1) is registered as an active standby but is unreachable
  - unable to  connect to node "h189" (ID: 2)
  - node "h189" (ID: 2) is registered as an active primary but is unreachable
  - unable to connect to node "h181" (ID: 4)'s upstream node "h189" (ID: 2)
  - unable to determine if node "h181" (ID: 4) is attached to its upstream node "h189" (ID: 2)
  - unable to  connect to node "h182" (ID: 5)
  - node "h182" (ID: 5) is registered as an active standby but is unreachable
  - unable to connect to node "h190" (ID: 6)'s upstream node "h189" (ID: 2)

HINT: execute with --verbose option to see connection error messages

repmgr.conf:

node_id=4
node_name='h181'
conninfo='host=h181 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/12/data'
replication_user='postgres'
replication_type='physical'
use_replication_slots=yes
log_level='WARNING'
log_file='/var/log/repmgr/repmgr.log'
log_status_interval=300
pg_basebackup_options='--wal-method=stream'
ssh_options='-q -o ConnectTimeout=10'
promote_check_timeout=60
promote_check_interval=1
primary_follow_timeout=60
standby_follow_timeout=15
shutdown_check_timeout=60
standby_reconnect_timeout=60
wal_receive_check_timeout=30
node_rejoin_timeout=60
failover='automatic'
priority=100
connection_check_type='ping'
reconnect_attempts=6
reconnect_interval=10
promote_command='/usr/bin/repmgr12 standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr12 standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
primary_notification_timeout=60
repmgrd_standby_startup_timeout=60
standby_disconnect_on_failover=true
sibling_nodes_disconnect_timeout=30
primary_visibility_consensus=true
service_start_command = 'sudo /etc/init.d/postgresql-12 --quiet start'
service_stop_command = 'sudo /etc/init.d/postgresql-12 --quiet stop'
service_restart_command = 'sudo /etc/init.d/postgresql-12 --quiet restart'
service_reload_command = 'sudo /etc/init.d/postgresql-12 --quiet reload'
repmgrd_service_start_command = '/etc/init.d/repmgrd-12 --quiet start'
repmgrd_service_stop_command = '/etc/init.d/repmgrd-12 --quiet stop'
archive_ready_warning=16
archive_ready_critical=128
replication_lag_warning=300
replication_lag_critical=600

log from live standby:

[2020-12-28 14:26:32] [DETAIL] attempted to connect using:
  user=repmgr connect_timeout=2 dbname=repmgr host=h182 fallback_application_name=repmgr options=-csearch_path=
[2020-12-28 14:26:36] [WARNING] unable to ping "host=h189 user=repmgr dbname=repmgr connect_timeout=2"
[2020-12-28 14:26:36] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-12-28 14:26:37] [WARNING] unable to ping "host=h189 user=repmgr dbname=repmgr connect_timeout=2"
[2020-12-28 14:26:37] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2020-12-28 14:26:39] [ERROR] connection to database failed
[2020-12-28 14:26:39] [DETAIL] 
timeout expired

Answer 4 · 2021-01-29T12:22:42.000Z

@grishin-a I assume if you shutdown 3 of 5 nodes, there's no remaining majority of servers, so the failover doesn't take place. But maybe sometimes it would be desirable that a minority of servers executes a failover, at least if the witness is reachable.

Answer 5 · 2021-02-18T16:53:29.000Z

Hi,

I've got a similar trouble :

OS : Debian 10
PGSQL : 12.6
RepMgr : repmgr 5.2.0 / 50200
Repository : APT PGDG

dpkg output

ii  postgresql-12-repmgr                 5.2.0-2.pgdg100+1            amd64        replication manager for PostgreSQL 12
ii  repmgr                               5.2.0-2.pgdg100+1            all          replication manager for PostgreSQL (metapackage)
ii  repmgr-common                        5.2.0-2.pgdg100+1            all          replication manager for PostgreSQL common files

Password are provided by ~/.pgpass and connections between different nodes are password less ssh.

postgres@node2:~$ repmgr -f /etc/postgresql/12/main/repmgr.conf cluster show
ID | Name           | Role    | Status    | Upstream     | Location | Priority | Timeline | Connection string      
----+----------------+---------+-----------+--------------+----------+----------+----------+-------------------------------------------------------
45 | node1          | standby |   running | node2        | default  | 45       | 8        | host=172.18.43.45 port=5432 user=repmgr dbname=repmgr
46 | node2          | primary | * running |              | default  | 46       | 8        | host=172.18.43.46 port=5432 user=repmgr dbname=repmgr
47 | node3          | standby |   running | node2        | default  | 47       | 8        | host=172.18.43.47 port=5432 user=repmgr dbname=repmgr
51 | node4          | witness | * running | node2        | default  | 0        | n/a      | host=172.18.43.51 port=5432 user=repmgr dbname=repmgr

postgres@node2:~$ repmgr -f /etc/postgresql/12/main/repmgr.conf service status
ID | Name           | Role    | Status    | Upstream     | repmgrd | PID    | Paused? | Upstream last seen
----+----------------+---------+-----------+--------------+---------+--------+---------+--------------------
45 | node1          | standby |   running | node2    | running | 91707  | no      | 0 second(s) ago   
 46 | node2          | primary | * running |              | running | 39100  | no      | n/a               
 47 | node3          | standby |   running | node2    | running | 126855 | no      | 1 second(s) ago   
 51 | node4          | witness | * running | node2    | running | 49601  | no      | 0 second(s) ago

postgres@node2:~$ psql -c '\l+'
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description                
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
testdb    | test     | UTF8     | fr_FR.UTF-8 | fr_FR.UTF-8 |                       | 39 MB   | pg_default |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 7945 kB | pg_default | default administrative connection database
repmgr    | repmgr   | UTF8     | en_US.UTF-8 | en_US.UTF-8 |                       | 19 MB   | pg_default |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7801 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            |
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +| 7945 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            |
(5 rows)

postgres@node2:~$ psql -c '\dg+'
                                          List of roles
Role name |                         Attributes                         | Member of | Description
-----------+------------------------------------------------------------+-----------+-------------
backup    | Superuser, No inheritance                                  | {}        |
 test      |                                                            | {}        |
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}        |
 repmgr    | Superuser, Replication                                     | {}        |


postgres@node2:~$ psql repmgr -c '\dx+'
      Objects in extension "plpgsql"
            Object description            
-------------------------------------------
function plpgsql_call_handler()
function plpgsql_inline_handler(internal)
function plpgsql_validator(oid)
language plpgsql
(4 rows)
 
          Objects in extension "repmgr"
               Object description               
-------------------------------------------------
function repmgr.get_local_node_id()
function repmgr.get_new_primary()
function repmgr.get_repmgrd_pid()
function repmgr.get_repmgrd_pidfile()
function repmgr.get_upstream_last_seen()
function repmgr.get_upstream_node_id()
function repmgr.get_wal_receiver_pid()
function repmgr.notify_follow_primary(integer)
function repmgr.repmgrd_is_paused()
function repmgr.repmgrd_is_running()
function repmgr.repmgrd_pause(boolean)
function repmgr.reset_voting_status()
function repmgr.set_local_node_id(integer)
function repmgr.set_repmgrd_pid(integer,text)
function repmgr.set_upstream_last_seen(integer)
function repmgr.set_upstream_node_id(integer)
function repmgr.standby_get_last_updated()
function repmgr.standby_set_last_updated()
table repmgr.events
table repmgr.monitoring_history
table repmgr.nodes
table repmgr.voting_term
view repmgr.replication_status
view repmgr.show_nodes

postgres@node2:~$ psql repmgr -c '\dt+ repmgr.*'
                            List of relations
Schema |        Name        | Type  | Owner  |    Size    | Description
--------+--------------------+-------+--------+------------+-------------
repmgr | events             | table | repmgr | 128 kB     |
 repmgr | monitoring_history | table | repmgr | 7984 kB    |
 repmgr | nodes              | table | repmgr | 16 kB      |
 repmgr | voting_term        | table | repmgr | 8192 bytes |

When I try an automatic failover by shutting down the PostgreSQL service on node2, trace on repmgr are :

[2021-02-18 17:32:04] [ERROR] unable to execute get_primary_current_lsn()
[2021-02-18 17:32:04] [DETAIL]
FATAL:  terminating connection due to administrator command
SSL connection has been closed unexpectedly
 
[2021-02-18 17:32:04] [WARNING] unable to retrieve primary's current LSN
[2021-02-18 17:32:06] [DEBUG] connection check type is "ping"
[2021-02-18 17:32:06] [WARNING] unable to ping "host=172.18.43.46 port=5432 user=repmgr dbname=repmgr"
[2021-02-18 17:32:06] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-02-18 17:32:06] [DEBUG] monitoring upstream node 46 in degraded state for 2 seconds
[2021-02-18 17:32:06] [DEBUG] connection check type is "ping"
[2021-02-18 17:32:06] [WARNING] unable to ping "host=172.18.43.46 port=5432 user=repmgr dbname=repmgr"
[2021-02-18 17:32:06] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-02-18 17:32:06] [DEBUG] scanning 2 node records to detect new primary...
[2021-02-18 17:32:06] [DEBUG] connecting to: "user=repmgr dbname=repmgr host=172.18.43.47 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
[2021-02-18 17:32:08] [DEBUG] connection check type is "ping"
[2021-02-18 17:32:08] [WARNING] unable to ping "host=172.18.43.46 port=5432 user=repmgr dbname=repmgr"
[2021-02-18 17:32:08] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-02-18 17:32:08] [DEBUG] monitoring upstream node 46 in degraded state for 4 seconds
[2021-02-18 17:32:08] [DEBUG] connection check type is "ping"
[2021-02-18 17:32:08] [WARNING] unable to ping "host=172.18.43.46 port=5432 user=repmgr dbname=repmgr"
[2021-02-18 17:32:08] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-02-18 17:32:08] [DEBUG] scanning 2 node records to detect new primary...
[2021-02-18 17:32:08] [DEBUG] connecting to: "user=repmgr dbname=repmgr host=172.18.43.47 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="

postgres@node3:~$ repmgr -f /etc/postgresql/12/main/repmgr.conf cluster show
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.45 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.45 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.46 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.46 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.47 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.46 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.51 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr dbname=repmgr host=172.18.43.46 port=5432 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
ID | Name           | Role    | Status        | Upstream       | Location | Priority | Timeline | Connection string
----+----------------+---------+---------------+----------------+----------+----------+----------+-------------------------------------------------------
45 | node1   | standby |   running     | ? node2 | default  | 45       | 8        | host=172.18.43.45 port=5432 user=repmgr dbname=repmgr
46 | node2   | primary | ? unreachable | ?              | default  | 46       |          | host=172.18.43.46 port=5432 user=repmgr dbname=repmgr
47 | node3   | standby |   running     | ? node2 | default  | 47       | 8        | host=172.18.43.47 port=5432 user=repmgr dbname=repmgr
51 | node4 | witness | * running     | ? node2 | default  | 0        | n/a      | host=172.18.43.51 port=5432 user=repmgr dbname=repmgr
 
WARNING: following issues were detected
  - unable to connect to node "node1" (ID: 45)'s upstream node "node2" (ID: 46)
  - unable to determine if node "node1" (ID: 45) is attached to its upstream node "node2" (ID: 46)
  - unable to connect to node "node2" (ID: 46)
  - node "node2" (ID: 46) is registered as an active primary but is unreachable
  - unable to connect to node "node3" (ID: 47)'s upstream node "node2" (ID: 46)
  - unable to determine if node "node3" (ID: 47) is attached to its upstream node "node2" (ID: 46)
  - unable to connect to node "node1" (ID: 51)'s upstream node "node2" (ID: 46)
 
HINT: execute with --verbose option to see connection error messages

On each standby nodes. Just telling than the primary is down. Nobody try to promote himself.

This is my repmgr.conf :

node_id=45
 
config_directory='/etc/postgresql/12/main'
data_directory='/var/lib/postgresql/12/main'
pg_bindir='/usr/lib/postgresql/12/bin'
 
node_name='node1'
conninfo='host=172.18.43.45 port=5432 user=repmgr dbname=repmgr'
 
failover='automatic'
primary_visibility_consensus=yes
 
promote_command='/usr/lib/postgresql/12/bin/repmgr standby promote -f /etc/postgresql/12/main/repmgr.conf --log-to-file'
follow_command='/usr/lib/postgresql/12/bin/repmgr standby follow -f /etc/postgresql/12/main/repmgr.conf --log-to-file --upstream-node-id=%n'
 
log_file='/var/log/postgresql/repmgr-12-main.log'
log_level='NOTICE'
 
service_start_command='/usr/bin/sudo /usr/bin/pg_ctlcluster 12 main start'
service_stop_command='/usr/bin/sudo /usr/bin/pg_ctlcluster 12 main stop'
service_restart_command='/usr/bin/sudo /usr/bin/pg_ctlcluster 12 main restart'
service_reload_command='/usr/bin/sudo /usr/bin/pg_ctlcluster 12 main reload'
service_promote_command='/usr/bin/sudo /usr/bin/pg_ctlcluster 12 main promote'
 
repmgrd_service_start_command='/usr/bin/sudo /bin/systemctl start repmgrd@12-main.service'
repmgrd_service_stop_command='/usr/bin/sudo /bin/systemctl stop repmgrd@12-main.service'
 
pg_basebackup_options='--label=repmgr_backup_12_main'
ssh_options='-q -o StrictHostKeyChecking=no -o ConnectTimeout=10'
 
monitoring_history='true'
priority=100
promote_check_timeout=10
reconnect_attempts=2
reconnect_interval=1
replication_lag_critical=7200
replication_lag_warning=3600
use_replication_slots='yes'

The same for each node except node_id and conninfo which are different.

Please, can you help me ?

Answer 6 · 2021-03-02T09:55:25.000Z

@pierhommedba in your config file of node 45 the priority is 100, but with repmgr cluster show it is 45. It looks like you have not registered the standby again and also not restarted the repmgrd's to change the priority. Maybe you also changed the value of the failover parameter and forgot to restart repmgrd.