EnterpriseDB/repmgr

ERROR: "data_directory" parameter in "repmgr.conf" on "pg1" (ID: 1) is incorrectly configured

Opened this issue · 5 comments

Hello.

I got an error when trying to switchover after doing a postgres major update:

ERROR: "data_directory" parameter in "repmgr.conf" on "pg1" (ID: 1) is incorrectly configured

I upgraded my cluster from PostgreSQL 13.6 and Repmgr 5.3.1 to PostgreSQL 14.2 and Repmgr 5.3.1 - upgraded primary, unregister old standby, create standby (postgresql 14.2) and register. Server pg1 - primary, server pg2 - standby. I using CentOS Linux release 7.9.2009 (Core).

After that, I try switchover my cluster and get this error:

root@pg2 log$ sudo -u postgres /usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf cluster show
 ID | Name | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                            
----+------+---------+-----------+----------+----------+----------+----------+-----------------------------------------------
 1  | pg1  | primary | * running |          | default  | 100      | 1        | host=192.168.56.130 dbname=repmgr user=repmgr
 2  | pg2  | standby |   running | pg1      | default  | 100      | 1        | host=192.168.56.131 dbname=repmgr user=repmgr

root@pg2 log$ sudo -u postgres /usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf standby switchover --dry-run
NOTICE: checking switchover on node "pg2" (ID: 2) in --dry-run mode
INFO: SSH connection to host "192.168.56.130" succeeded
ERROR: "data_directory" parameter in "repmgr.conf" on "pg1" (ID: 1) is incorrectly configured
HINT: execute "repmgr node check --data-directory-config" on "pg1" (ID: 1) to diagnose the issue

root@pg2 log$ sudo -u postgres /usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf standby switchover
NOTICE: executing switchover on node "pg2" (ID: 2)
ERROR: "data_directory" parameter in "repmgr.conf" on "pg1" (ID: 1) is incorrectly configured
HINT: execute "repmgr node check --data-directory-config" on "pg1" (ID: 1) to diagnose the issue

I try to run check on my primary pg1 and result is OK:

root@pg1 data$ sudo -u postgres /usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf node check --data-directory-config
OK (configured "data_directory" is "/var/lib/pgsql/14/data")

root@pg1 data$ sudo -u postgres /usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf node check --data-directory-config --verbose -LDEBUG
NOTICE: using provided configuration file "/etc/repmgr/14/repmgr.conf"
DEBUG: set_config():
  SET synchronous_commit TO 'local'
DEBUG: get_node_record():
  SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1
DEBUG: get_node_replication_stats():
 SELECT pg_catalog.current_setting('max_wal_senders')::INT AS max_wal_senders,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS attached_wal_receivers,         current_setting('max_replication_slots')::INT AS max_replication_slots,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE slot_type='physical') AS total_replication_slots,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE active IS TRUE AND slot_type='physical')  AS active_replication_slots,         (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE active IS FALSE AND slot_type='physical') AS inactive_replication_slots,         pg_catalog.pg_is_in_recovery() AS in_recovery
DEBUG: get_pg_setting():
  SELECT name, setting   FROM pg_catalog.pg_settings WHERE name = 'data_directory'
DEBUG: get_pg_setting(): returned value is "/var/lib/pgsql/14/data"
OK (configured "data_directory" is "/var/lib/pgsql/14/data")

My PGDATA on servers pg1 and pg2 store in /var/lib/pgsql/14/data/ and in repmgr.conf parameter data_directory set to:

data_directory='/var/lib/pgsql/14/data/'

I tried changing the value to data_directory='/var/lib/pgsql/14/data' but it doesn't affect the result.

I tried unregister standby and primary and re-register. Tried unregister servsers and remove repmgr database and extension, and reinstall repmgr on nodes, register cluster (prmary and standby) again as new. All this has no effect.

Finally, I unregister standby and primary, remove repmgr databse, remove extension 5.3.1 for PostgreSQL 14 from servers as package (yum remove repmgr_14), and install for PostgreSQL 14 repmgr version 5.3.0 (yum install repmgr_14-5.3.0). After that I install repmgr in cluster, register primary and standby and switchover is working successfully. I've done several server role switches back and forth with no errors.

On version repmgr 5.3.1 and PostgreSQL 13 switchover are worked, but on same version repmgr 5.3.1 and PostgreSQL 14 switchover stopped working. Downgrade to repmgr 5.3.0 for PostgreSQL 14 solved a problem.

Reproduced in the same way on version of repmgr 5.3.0

Same here. Running postgres 14.2 and repmgr 5.3.1 on ubuntu lts 20.04.

I have been unable to reproduce this with PostgreSQL 14.2 / repmgr 5.3.1.

Try executing this command (as root, as that's what you show in your examples) on both nodes:

sudo -u postgres ssh -o Batchmode=yes $other_node /usr/pgsql-14/bin/repmgr -f /etc/repmgr/14/repmgr.conf node check --data-directory-config -LDEBUG 2>&1

(Replace $other_node with the hostname of the other node). That should output more detail about the reason for the apparent failure.

Hello.
I was run your command on otcher cluster with the same problem, but check return OK:

OK (configured "data_directory" is "/var/lib/pgsql/14/data")

Hello.
I found the cause of the problem.

I decided to check the entries in the tables of the repmgr database, and found that for the master, after the major update of PostgreSQL, I still have the path to the 13th version of PostgreSQL in the config_file column in the repmgr.nodes table - '/etc/repmgr/13/repmgr.conf '.

select * from repmgr.nodes;

I updated the path to the config file with a simple update request:

update repmgr.nodes set config_file = '/etc/repmgr/14/repmgr.conf' where node_id = 1;

And now everything works.

Because during a major upgrade of PostgreSQL, the entire cluster is transferred to the new version, the repmgr database was transferred with the old records. I recreated and re-registered the replicas, and they got a new path to the configuration file, while the master had the old one from the 13th version.

I believe that during a major PostgreSQL upgrade, it makes sense to remove the repmgr database from the cluster (if repmgrd is used, the mechanism will be more complicated), upgrade PostgreSQL, and then configure repmgr on the cluster.