Corosync is a cluster stack written as a reimplementation of all the core functionalities required by openais. Meant to provide 100% correct operation during failures or partitionable networks.
Most famous for being the cluster stack used by Pacemaker to support n-code clusters that can respond to node and resource level events.
To install and configure Corosync
class { 'corosync':
enable_secauth => true,
authkey => '/var/lib/puppet/ssl/certs/ca.pem',
bind_address => $ipaddress,
multicast_address => '239.1.1.2',
}
To enable Pacemaker
corosync::service { 'pacemaker':
version => '0',
}
The resources that Corosync will manage can be referred to as a primitive. These are things like virtual IPs or services like drbd, nginx, and apache.
To assign a VIP to a network interface to be used by Nginx
cs_primitive { 'nginx_vip':
primitive_class => 'ocf',
primitive_type => 'IPaddr2',
provided_by => 'heartbeat',
parameters => { 'ip' => '172.16.210.100', 'cidr_netmask' => '24' },
operations => { 'monitor' => { 'interval' => '10s' } },
}
Make Corosync manage and monitor the state of Nginx using a custom OCF agent
cs_primitive { 'nginx_service':
primitive_class => 'ocf',
primitive_type => 'nginx_fixed',
provided_by => 'pacemaker',
operations => {
'monitor' => { 'interval' => '10s', 'timeout' => '30s' },
'start' => { 'interval' => '0', 'timeout' => '30s', 'on-fail' => 'restart' }
},
require => Cs_primitive['nginx_vip'],
}
Make Corosync manage and monitor the state of Apache using a LSB agent
cs_primitive { 'apache_service':
primitive_class => 'lsb',
primitive_type => 'apache2',
provided_by => 'heartbeat',
operations => {
'monitor' => { 'interval' => '10s', 'timeout' => '30s' },
'start' => { 'interval' => '0', 'timeout' => '30s', 'on-fail' => 'restart' }
},
require => Cs_primitive['apache2_vip'],
}
Note: Operations with the same names should be declared as an Array. Example:
cs_primitive { 'pgsql_service':
primitive_class => 'ocf',
primitive_type => 'pgsql',
provided_by => 'heartbeat',
operations => {
'monitor' => [
{ 'interval' => '10s', 'timeout' => '30s' },
{ 'interval' => '5s', 'timeout' => '30s', 'role' => 'Master' },
],
'start' => { 'interval' => '0', 'timeout' => '30s', 'on-fail' => 'restart' }
},
}
Locations determine on which nodes primitive resources run.
cs_location { 'nginx_service_location':
primitive => 'nginx_service',
node_name => 'hostname',
score => 'INFINITY'
}
Colocations keep primitives together. Meaning if a vip moves to web02 from web01 because web01 just hit the dirt it will drag the nginx service with it.
cs_colocation { 'vip_with_service':
primitives => [ 'nginx_vip', 'nginx_service' ],
}
Colocation defines that a set of primitives must live together on the same node but order definitions will define the order of which each primitive is started. If Nginx is configured to listen only on our vip we definitely want the vip to be migrated to a new node before nginx comes up or the migration will fail.
cs_order { 'vip_before_service':
first => 'nginx_vip',
second => 'nginx_service',
require => Cs_colocation['vip_with_service'],
}
Cloned resources should be active on multiple hosts at the same time. You can clone any existing resource provided the resource agent supports it.
cs_clone { 'nginx_service-clone' :
ensure => present,
primitive => 'nginx_service',
clone_max => 3,
require => Cs_primitive['nginx_service'],
}
A few global settings can be changed with the "cs_property" section.
Disable STONITH if required.
cs_property { 'stonith-enabled' :
value => 'false',
}
Change quorum policy
cs_property { 'no-quorum-policy' :
value => 'ignore',
}
A few global settings can be changed with the "cs_rsc_defaults" section.
Don't move resources.
cs_rsc_defaults { 'resource-stickiness' :
value => 'INFINITY',
}
Tested and built on Debian 6 using backports so version 1.4.2 of Corosync is validated to function.
This module doesn't abstract away everything about managing Corosync but makes setup and automation easier. Things that are currently outstanding...
- Needs a lot more tests.
- There is already a handful of bugs that need to be worked out.
- Plus a other things since Corosync and Pacemaker do a lot.
We suggest you at least go read the Clusters from Scratch document from Cluster Labs. It will help you out a lot when understanding how all the pieces fall together a point you in the right direction when Corosync fails unexpectedly.
A simple but complete manifest example can be found on Cody Herriges' Github, plus there are more incomplete examples spread across the Puppet Labs Github.
Copyright (C) 2012 Puppet Labs Inc
Puppet Labs can be contacted at: info@puppetlabs.com
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.