ClusterLabs/fence-agents

Azure_ARM fence agent - pcmk_delay_max and priority-fencing-delay

db6thomas opened this issue · 4 comments

To avoid a fence race, it is possible to use above parameters to address this issue. References can be found here:

https://docs.microsoft.com/en-us/azure/virtual-machines/workloads/sap/high-availability-guide-rhel-pacemaker
https://www.suse.com/support/kb/doc/?id=000019110
https://access.redhat.com/solutions/5110521

The options do not work for the Azure_ARM fence agent 4.7.1 and 4.9.1.
Both gets ignored and fencing race happens.

Is this for purpose or just missing yet?

These parameters depend on which version of pacemaker you're running.

Can you post your output of rpm -qa | grep pacemaker?

Hello,

azr-sd01:~ # rpm -qa |grep pacemaker
libpacemaker3-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64
pacemaker-remote-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64
pacemaker-cli-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64
libpacemaker-devel-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64
pacemaker-cts-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.noarch
pacemaker-2.0.4+20200616.2deceaa3a-1.2.db2pcmk.x86_64

This the Pacemaker, that comes integrated with Db2 - therefore you see db2pcmk.x86_64. In Pacemaker and Corosync, no changes where made beside new packaging.
With the same pacemaker version, we tested on AWS and there the parameters works and fencing race can be avoided.

pcmk_delay_max, pcmk_delay_base, priority-fencing-delay are being executed by pacemaker (fenced) prior to execute the action on the fence-agent while other delay-parameters are passed to the fence-agent.

The reason why fence_azure_arm is behaving differently might be due to code in fence_aws to avoid race conditions:
https://github.com/ClusterLabs/fence-agents/pull/323/files

Maybe you should use pcmk_delay_base instead? That's used for base+random value.

If you have further issues you can try mailing: http://oss.clusterlabs.org/mailman/listinfo/users where users/devs of all the ClusterLabs projects can answer.