pgsqld_monitor_0 on server3 'not installed' (5): call=12, status=complete, exitreason
karippery opened this issue · 4 comments
I need a help.
I would like to Building a highly avialable multi-node PostgreSQL cluster, using freely available software including Pacemaker, Corsync, pcs and PostgresSQL on debian.
I followed this documentation.
I received failed action:
Failed Actions:
* pgsqld_monitor_0 on server3 'not installed' (5): call=12, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
last-rc-change='Fri Jun 26 16:20:11 2020', queued=0ms, exec=135ms
* pgsqld_monitor_0 on server1 'not installed' (5): call=12, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
last-rc-change='Fri Jun 26 16:20:06 2020', queued=0ms, exec=133ms
* pgsqld_monitor_0 on server2 'not installed' (5): call=12, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
last-rc-change='Fri Jun 26 16:19:53 2020', queued=0ms, exec=146ms
/etc/postgresql/9.6/main/postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
/etc/postgresql/9.6/main/pg_hba.conf
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local replication postgres peer
#host replication postgres 127.0.0.1/32 md5
#host replication postgres ::1/128 md5
# forbid self-replication
host replication postgres 12.222.179.205/32 reject
host replication postgres oreo reject
# allow any standby connection
host replication postgres 0.0.0.0/0 trust
pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir="/usr/lib/postgresql/9.6/bin" pgdata="/etc/postgresql/9.6/main" datadir="/var/lib/postgresql/9.6/main" recovery_template="/etc/postgresql/9.6/main/recovery.conf.pcmk" pghost="/var/run/postgresql" op start timeout=60s op stop timeout=60s op promote timeout=30s op demote timeout=120s op monitor interval=15s timeout=10s role="Master" op monitor interval=16s timeout=10s role="Slave" op notify timeout=60s
pcs resource master pgsql-ha pgsqld notify=true
Hello,
What is your version of Pacemaker, Debian and pcs please ?
Regards,
Thank you for reply
I think I got this error from fence my cluster nodes. documentation about fencing is not clear could you please explain to how can i setup my fencing? is fencing is impotent? now I have different error.
pgsqld_monitor_0 on server3 'not installed' (5): call=5, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
last-rc-change='Mon Jun 29 14:51:12 2020', queued=0ms, exec=148ms
* pgsqld_monitor_0 on server1 'not installed' (5): call=5, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
last-rc-change='Mon Jun 29 14:51:07 2020', queued=1ms, exec=142ms
* pgsqld_monitor_0 on server2 'not installed' (5): call=5, status=complete, exitreason='You must set meta parameter notify=true for your master resource',
last-rc-change='Mon Jun 29 14:50:54 2020', queued=0ms, exec=135ms
version
debian 9.2
pcs 0.9.155
postgresql 9.6
more information
server1
cat <<EOP >> postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
EOP
cat <<EOP >> pg_hba.conf
# forbid self-replication
host replication postgres 129.226.179.205/32 reject
host replication postgres oreo reject
# allow any standby connection
host replication postgres 0.0.0.0/0 trust
EOP
cat <<EOP > recovery.conf
standby_mode = on
primary_conninfo = 'host=129.226.179.205 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP
cat <<EOP > recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=129.226.179.205 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP
server2
cat <<EOP >> postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
EOP
cat <<EOP >> pg_hba.conf
# forbid self-replication
host replication postgres 129.226.179.206/32 reject
host replication postgres oreo reject
# allow any standby connection
host replication postgres 0.0.0.0/0 trust
EOP
cat <<EOP > recovery.conf
standby_mode = on
primary_conninfo = 'host=129.226.179.206 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP
cat <<EOP > recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=129.226.179.206 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP
server3
cat <<EOP >> postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
hot_standby = on
hot_standby_feedback = on
logging_collector = on
EOP
cat <<EOP >> pg_hba.conf
# forbid self-replication
host replication postgres 129.226.179.207/32 reject
host replication postgres oreo reject
# allow any standby connection
host replication postgres 0.0.0.0/0 trust
EOP
cat <<EOP > recovery.conf
standby_mode = on
primary_conninfo = 'host=129.226.179.207 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP
cat <<EOP > recovery.conf.pcmk
standby_mode = on
primary_conninfo = 'host=129.226.179.207 application_name=$(hostname -s)'
recovery_target_timeline = 'latest'
EOP
pcs config
Cluster Name: cluster_pgsql
Corosync Nodes:
server1 server2 server3
Pacemaker Nodes:
server1 server2 server3
Resources:
Master: pgsql-ha
Meta Attrs: notify=true
Resource: pgsqld (class=ocf provider=heartbeat type=pgsqlms)
Attributes: bindir=/usr/lib/postgresql/9.6/bin pgdata=/etc/postgresql/9.6/main datadir=/var/lib/postgresql/9.6/main recovery_template=/etc/postgresql/9.6/main/recovery.conf.pcmk pghost=/var/run/postgresql
Operations: start interval=0s timeout=60s (pgsqld-start-interval-0s)
stop interval=0s timeout=60s (pgsqld-stop-interval-0s)
promote interval=0s timeout=30s (pgsqld-promote-interval-0s)
demote interval=0s timeout=120s (pgsqld-demote-interval-0s)
monitor interval=15s role=Master timeout=10s (pgsqld-monitor-interval-15s)
monitor interval=16s role=Slave timeout=10s (pgsqld-monitor-interval-16s)
notify interval=0s timeout=60s (pgsqld-notify-interval-0s)
Resource: pgsql-master-ip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=129.226.179.208 cidr_netmask=32 iflabel=pgrepvip
Meta Attrs: target-role=Started
Operations: start interval=0s timeout=20s (pgsql-master-ip-start-interval-0s)
stop interval=0s timeout=20s (pgsql-master-ip-stop-interval-0s)
monitor interval=30s (pgsql-master-ip-monitor-interval-30s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
migration-threshold: 5
resource-stickiness: 10
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster_pgsql
dc-version: 1.1.16-94ff4df
have-watchdog: false
Quorum:
#root@oreo:~# pcs resource create pgsqld ocf:heartbeat:pgsqlms bindir="/usr/lib/postgresql/9.6/bin" pgdata="/etc/postgresql/9.6/main" datadir="/var/lib/postgresql/9.6/main" recovery_template="/etc/postgresql/9.6/main/recovery.conf.pcmk" pghost="/var/run/postgresql" op start timeout=60s op stop timeout=60s op promote timeout=30s op demote timeout=120s op monitor interval=15s timeout=10s role="Master" op monitor interval=16s timeout=10s role="Slave" op notify timeout=60s
#root@oreo:~# pcs resource master pgsql-ha pgsqld notify=true
pcs resource create pgsql-master-ip ocf:heartbeat:IPaddr2 ip=10.226.179.208 cidr_netmask=32 nic=lo op monitor interval=30s
Hi,
I think I got this error from fence my cluster nodes. documentation about fencing is not clear could you please explain to how can i setup my fencing? is fencing is impotent?
Yes, fencing is vital. You can either setup active fencing or passive fencing with watchdog. See fencing documentation age on PAF website.
now I have different error.
What you pasted seems identical to me.
I replayed the quick start for debian 9. I've found some small details to adjust there, but nothing in regard with your issue. And my cluster is up and running smoothly...
Based on your config, you shouldn't have this error. Could you please exec the following command and report the result here ?
crm_resource --resource pgsql-ha --meta --get-parameter notify 2>/dev/null
On a side note, you must set the same IP address in primary_conninfo host parameter. This parameter allows the standby to connect to the primary. As far as I understand your setup currently, they all try to connect locally...This should be "129.226.179.208". And based on your setup, I suppose oreo
must resolved as "129.226.179.208".