/ha_via_corosync_pacamaker

High availability cluster via corosync and pacemaker

Primary LanguageShell

HA Cluster via Corosync/Pacemaker

A cluter of the nodes runs a floating ip service and an nginx service. Both nodes are reachable with a shared floating ip. The nginx service is highly available via an active/passive failover paradigm. Docker is used to containerize the system.

Instructions

  1. start.sh
  2. Access to service_01 shell via docker exec -ti node_01 /bin/bash (It can be done with any node)
  3. In service_01 launch run.sh that setup the cluster and the services
  4. Test the availability of the web service on 172.28.0.100

Useful commands for monitoring the cluster

pcs status

pcs status corosync

corosync-cmapctl | grep members

pcs status nodes

pcs cluster stop node_01 --force

pcs resource move resource_id [destination_node]

pcs resource failcount show shellscript

Resources

How to Run Shell Scripts as SystemD services and as Corosync/Pacemaker Resource Agents

  1. Create the script and make it executable
vi /home/shellscript.sh 
chmod +x /home/shellscript.sh 
  1. Create a SystemD File
vi /lib/systemd/system/shellscript.service 

shellscript.service:

[Unit]
Description=My Shell Script

[Service]
ExecStart=/home/shellscript.sh
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
  1. Enable the service
systemctl daemon-reload
systemctl enable shellscript.service

Run custom SystemD service as a Corosync/Pacemaker Resource Agent

IMPORTANT NOTE Make sure the services are installed on every node of the cluster!

  1. Check that the RA is in the list of the resource agents under the systemd provider (it shoulud appear also as generic linux service)
pcs resource list
  1. Create the resource. Also some operations on the resource are created. These operations allow to monitor, start and stop of the resource.
pcs resource create shellscript_resource systemd:shellscript \
op monitor interval=30s \
op start timeout=180s \
op stop timeout=180s \
op status timeout=15

interval: set the frequency for the operation ß timeout: if the operation does not comlete by the amount set, abort the operation and consider it failed

Extra

Constraints

You can determine the behavior of a resource in a cluster by configuring constraints for that resource. You can configure the following categories of constraints:ß

  • location constraints — A location constraint determines which nodes a resource can run on.
  • order constraints — An order constraint determines the order in which the resources run.
  • colocation constraints — A colocation constraint determines whßere resources will be placed relative to other resources.

Alerts

TO-DO: Testing...

pcs alert create id=alertscript path=/home/alertscript.sh
touch /home/test.log
pcs alert recipient add alertscript value=/home/test.log

Resouces

For LSB services:

In case you want to create an OCF Resource Agent:

Alerting: