EnterpriseDB/repmgr

Incorrect random value for repmgr.get_upstream_node_id() on witness node

chvsunny opened this issue · 7 comments

Hi,

We are using repmgr 5.2.1 and observing that the function repmgr.get_upstream_node_id() is always producing random values even though no change is cluster state.

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:24:59.903786+00 |     101 | witness |                1 |                    1
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:01.733538+00 |     101 | witness |                1 |                    2
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:02.745435+00 |     101 | witness |                1 |                    2
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:03.851729+00 |     101 | witness |                1 |                    2
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:04.751673+00 |     101 | witness |                1 |                    1
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:05.940818+00 |     101 | witness |                1 |                    2
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
      current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:06.78803+00 |     101 | witness |                1 |                    1
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:07.560983+00 |     101 | witness |                1 |                    2
(1 row)

pgwitness=# select CURRENT_TIMESTAMP, node_id, type, upstream_node_id, get_upstream_node_id() from nodes where node_id = get_local_node_id();
       current_timestamp       | node_id |  type   | upstream_node_id | get_upstream_node_id
-------------------------------+---------+---------+------------------+----------------------
 2022-03-10 02:25:08.370091+00 |     101 | witness |                1 |                    1
(1 row)

Thanks,
chvsunny.

FYI, we have one witness cluster acting as witness for multiple postgres clusters. All of these have separate databases in the witness cluster. We are seeing the same behaviour i.e., different value for get_upstream_node_id() in all the databases.

Further observations:

  1. stopped all the repmgrd daemons on the witness server and started only one repmgrd.
    Then for all the databases in witness, get_upstream_node_id() returns the value of the upstream node id for which repmgrd is running

  2. Started multiple repmgrd daemons
    Then the output of get_upstream_node_id() is flipping between the values of upstream node ids for which the repmgrd daemons are running.

Thanks.

Here is our sample file

# Required configuration items
node_id=<node_id>
node_name='<fqdn_hostname>'
conninfo='host=<fqdn_hostname> port=<port> sslmode=require dbname=repmgr user=repmgr connect_timeout=3 application_name=repmgrd'
data_directory='<datadir>'

# Replication settings
location='<dc>'
use_replication_slots=1

# Witness server settings
witness_sync_interval=15

# Logging settings
log_level='INFO'
log_file='<repmgrLogFile>'
log_status_interval=15

# Event notification settings
event_notification_command='python /<path>/pg_repmgr_notifications.py <port> %n %e %s "%t" "%d"'

# Environment/command settings
pg_bindir='/usr/pgsql-<pg_version>/bin'

# external command options
pg_ctl_options='-s'
pg_basebackup_options='--wal-method=stream'
ssh_options='-p <ssh port>'

# Failover and monitoring settings (repmgrd)
failover='automatic'
priority=<priority>
connection_check_type=connection
reconnect_attempts=6               # Number of attempts which will be made to reconnect to an unreachable primary (or other upstream node)
reconnect_interval=5               # Interval between attempts to reconnect to an unreachable primary (or other upstream node)
failover_validation_command='python python /<path>/pre_failover_validation.py <port> %n %a'
promote_command='/usr/pgsql-<pg_version>/bin/repmgr standby promote -f <repmgrConfFile> --log-to-file --log-level DEBUG --verbose'
follow_command='/usr/pgsql-<pg_version>/bin/repmgr standby follow -f <repmgrConfFile> --log-to-file --log-level DEBUG --verbose --upstream-node-id=%n'
monitoring_history=yes
monitor_interval_secs=1
standby_disconnect_on_failover=true
primary_visibility_consensus=true   # If "true", only continue with failover if no standbys have seen
					                          # the primary node recently. *Must* be the same on all nodes.
repmgrd_pid_file='<repmgrPidFile>'

# service control commands
service_start_command='/usr/pgsql-<pg_version>/bin/pg_ctl -D <datadir> start'
service_stop_command='/usr/pgsql-<pg_version>/bin/pg_ctl -D <datadir> stop'
service_restart_command='/usr/pgsql-<pg_version>/bin/pg_ctl -D <datadir> restart'
service_reload_command='/usr/pgsql-<pg_version>/bin/pg_ctl -D <datadir> reload'

repmgrd_service_start_command='/usr/pgsql-<pg_version>/bin/repmgrd -f <repmgrConfFile> --daemonize'
repmgrd_service_stop_command='kill $(cat $(/usr/pgsql-<pg_version>/bin/repmgrd -f <repmgrConfFile> --show-pid-file))'

Hi,

Can someone from repmgr team can look into this?

Thanks,
Sandeep Challa.

Further observations:

  1. stopped all the repmgrd daemons on the witness server and started only one repmgrd.
    Then for all the databases in witness, get_upstream_node_id() returns the value of the upstream node id for which repmgrd is running
  2. Started multiple repmgrd daemons
    Then the output of get_upstream_node_id() is flipping between the values of upstream node ids for which the repmgrd daemons are running.

That's because the upstream node ID is written to shared memory, so it will be overwritten by each running repmgrd instance.

You will need completely separate PostgreSQL instances if you want to use a single server for multiple witnesses.

Hi Ian,

Thanks for the update.
Does this mean that every postgresql cluster needs to have its own witness postgres instance?

I remember you updating an issue saying that a single witness postgres instance can act as witness for multiple postgres clusters by having its own database in the witness instance.

Thanks!

Here is the other issue #660 which I was referring to.