pgsqld_monitor_0 on server3 'not installed' (5): call=12, status=complete, exitreason

Question

pgsqld_monitor_0 on server3 'not installed' (5): call=12, status=complete, exitreason

karippery opened this issue 4 years ago · 4 comments

I need a help.

I would like to Building a highly avialable multi-node PostgreSQL cluster, using freely available software including Pacemaker, Corsync, pcs and PostgresSQL on debian.
I followed this documentation.
I received failed action:

Failed Actions:
* pgsqld_monitor_0 on server3 'not installed' (5): call=12, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
    last-rc-change='Fri Jun 26 16:20:11 2020', queued=0ms, exec=135ms
* pgsqld_monitor_0 on server1 'not installed' (5): call=12, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
    last-rc-change='Fri Jun 26 16:20:06 2020', queued=0ms, exec=133ms
* pgsqld_monitor_0 on server2 'not installed' (5): call=12, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
    last-rc-change='Fri Jun 26 16:19:53 2020', queued=0ms, exec=146ms

/etc/postgresql/9.6/main/postgresql.conf

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on

/etc/postgresql/9.6/main/pg_hba.conf

# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication     postgres                                peer
#host    replication     postgres        127.0.0.1/32            md5
#host    replication     postgres        ::1/128                 md5

# forbid self-replication
host     replication     postgres        12.222.179.205/32       reject
host     replication     postgres        oreo                    reject

# allow any standby connection
host     replication     postgres        0.0.0.0/0               trust

pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir="/usr/lib/postgresql/9.6/bin"  pgdata="/etc/postgresql/9.6/main" datadir="/var/lib/postgresql/9.6/main"  recovery_template="/etc/postgresql/9.6/main/recovery.conf.pcmk" pghost="/var/run/postgresql"  op start timeout=60s  op stop timeout=60s op promote timeout=30s   op demote timeout=120s  op monitor interval=15s timeout=10s role="Master" op monitor interval=16s timeout=10s role="Slave" op notify timeout=60s

pcs resource master pgsql-ha pgsqld notify=true

Answer 1 · 2020-06-29T11:33:37.000Z

Hello,

What is your version of Pacemaker, Debian and pcs please ?

Regards,

Answer 2 · 2020-06-29T12:49:52.000Z

Thank you for reply

I think I got this error from fence my cluster nodes. documentation about fencing is not clear could you please explain to how can i setup my fencing? is fencing is impotent? now I have different error.

 pgsqld_monitor_0 on server3 'not installed' (5): call=5, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
    last-rc-change='Mon Jun 29 14:51:12 2020', queued=0ms, exec=148ms
* pgsqld_monitor_0 on server1 'not installed' (5): call=5, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
    last-rc-change='Mon Jun 29 14:51:07 2020', queued=1ms, exec=142ms
* pgsqld_monitor_0 on server2 'not installed' (5): call=5, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
    last-rc-change='Mon Jun 29 14:50:54 2020', queued=0ms, exec=135ms

version
debian 9.2
pcs 0.9.155
postgresql 9.6

more information

server1

cat <<EOP >> postgresql.conf

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
EOP
cat <<EOP >> pg_hba.conf

# forbid self-replication
host     replication     postgres        129.226.179.205/32       reject
host     replication     postgres        oreo                    reject

# allow any standby connection
host     replication     postgres        0.0.0.0/0               trust

EOP

cat <<EOP > recovery.conf
standby_mode = on
primary_conninfo = 'host=129.226.179.205 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP

cat <<EOP > recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=129.226.179.205 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP

server2

cat <<EOP >> postgresql.conf

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
EOP
cat <<EOP >> pg_hba.conf

# forbid self-replication
host     replication     postgres        129.226.179.206/32       reject
host     replication     postgres        oreo                    reject

# allow any standby connection
host     replication     postgres        0.0.0.0/0               trust

EOP

cat <<EOP > recovery.conf
standby_mode = on
primary_conninfo = 'host=129.226.179.206 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP

cat <<EOP > recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=129.226.179.206 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP

server3

cat <<EOP >> postgresql.conf

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
EOP
cat <<EOP >> pg_hba.conf

# forbid self-replication
host     replication     postgres        129.226.179.207/32       reject
host     replication     postgres        oreo                    reject

# allow any standby connection
host     replication     postgres        0.0.0.0/0               trust

EOP

cat <<EOP > recovery.conf
standby_mode = on
primary_conninfo = 'host=129.226.179.207 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP

cat <<EOP > recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=129.226.179.207 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP

pcs config

Cluster Name: cluster_pgsql
Corosync Nodes:
 server1 server2 server3
Pacemaker Nodes:
 server1 server2 server3

Resources:
 Master: pgsql-ha
  Meta Attrs: notify=true
  Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
   Attributes: bindir=/usr/lib/postgresql/9.6/bin pgdata=/etc/postgresql/9.6/main datadir=/var/lib/postgresql/9.6/main recovery_template=/etc/postgresql/9.6/main/recovery.conf.pcmk pghost=/var/run/postgresql
   Operations: start interval=0s timeout=60s (pgsqld-start-interval-0s)
               stop interval=0s timeout=60s (pgsqld-stop-interval-0s)
               promote interval=0s timeout=30s (pgsqld-promote-interval-0s)
               demote interval=0s timeout=120s (pgsqld-demote-interval-0s)
               monitor interval=15s role=Master timeout=10s (pgsqld-monitor-interval-15s)
               monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s)
               notify interval=0s timeout=60s (pgsqld-notify-interval-0s)
 Resource: pgsql-master-ip (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: ip=129.226.179.208 cidr_netmask=32 iflabel=pgrepvip
  Meta Attrs: target-role=Started
  Operations: start interval=0s timeout=20s (pgsql-master-ip-start-interval-0s)
              stop interval=0s timeout=20s (pgsql-master-ip-stop-interval-0s)
              monitor interval=30s (pgsql-master-ip-monitor-interval-30s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 migration-threshold: 5
 resource-stickiness: 10
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_pgsql
 dc-version: 1.1.16-94ff4df
 have-watchdog: false

Quorum:

Answer 3 · 2020-06-29T12:56:37.000Z

#root@oreo:~# pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir="/usr/lib/postgresql/9.6/bin"  pgdata="/etc/postgresql/9.6/main" datadir="/var/lib/postgresql/9.6/main"  recovery_template="/etc/postgresql/9.6/main/recovery.conf.pcmk" pghost="/var/run/postgresql"  op start timeout=60s  op stop timeout=60s op promote timeout=30s   op demote timeout=120s  op monitor interval=15s timeout=10s role="Master" op monitor interval=16s timeout=10s role="Slave" op notify timeout=60s

#root@oreo:~# pcs resource master pgsql-ha pgsqld notify=true

 pcs resource create pgsql-master-ip ocf:heartbeat:IPaddr2 ip=10.226.179.208 cidr_netmask=32 nic=lo op monitor interval=30s

Answer 4 · 2020-06-30T11:55:18.000Z

Hi,

I think I got this error from fence my cluster nodes. documentation about fencing is not clear could you please explain to how can i setup my fencing? is fencing is impotent?

Yes, fencing is vital. You can either setup active fencing or passive fencing with watchdog. See fencing documentation age on PAF website.

now I have different error.

What you pasted seems identical to me.

I replayed the quick start for debian 9. I've found some small details to adjust there, but nothing in regard with your issue. And my cluster is up and running smoothly...

Based on your config, you shouldn't have this error. Could you please exec the following command and report the result here ?

crm_resource --resource pgsql-ha --meta --get-parameter notify 2>/dev/null

On a side note, you must set the same IP address in primary_conninfo host parameter. This parameter allows the standby to connect to the primary. As far as I understand your setup currently, they all try to connect locally...This should be "129.226.179.208". And based on your setup, I suppose oreo must resolved as "129.226.179.208".