voxpupuli/puppet-corosync

Corosync vs Pacemaker: wrong usage of "Corosync"

Opened this issue · 26 comments

Corosync doesn't manage resources. Corosync provides reliable communication between nodes, manages cluster membership and determines quorum. Pacemaker is a cluster resource manager (CRM) that manages the resources that make up the cluster, such as IP addresses, mount points, file systems, DRBD devices, services such as MySQL or Apache and so on. Basically everything that can be monitored, stopped, started and moved around between nodes.

Pacemaker does not depend on Corosync, it could use Heartbeat (v3) for communication, membership and quorum instead. Corosync could also work without Pacemaker, for example with Red Hat's CMAN.

This is a documentation problem but also reflected in the names of the types this module provides. The Linux HA stack and its history as well as various cluster components are already confusing enough so it is important to not mix up terms.

I'll submit a PR for the Readme but I don't think it will be possible to rename the types this module provides at this point.

If the terminology in the module is wrong we should ship a new major version with the right names, and make an incompatible break. I think it matters more to be correct than to be compatible here - especially since this will be an ongoing source of confusion for every person who picks up the module from now to eternity if we don't fix it.

Not that I have the final say, but I would totally plus 💯 a pull request that fixed the type names as well.

What would be a good prefix to use instead of cs_? Maybe pm_ for Pacemaker? Or maybe better, crm_? The types this module provides largely represent the commands of the CRM shell (primitive, group, order, colocation, ...). Also, as far as I can see, the commands wrap calls to /usr/sbin/crm, so crm_ probably makes sense, no?

crm_ has my vote. If I was starting from scratch today that's what I'd go with.

ody commented

Branan Purvine-Riley wrote:

|crm_| has my vote. If I was starting from scratch today that's what
I'd go with.


Reply to this email directly or view it on GitHub
#32 (comment).

crm_ is also my vote. pm_ would be to confusing since there are pacemaker, lsb, and heartbeat providers available. crm_ is the least confusing.

I am currently working on renaming the types from cs_* to crm_* and updating the documentation accordingly. It'll probably take me a day or two to complete this, then I'll submit a PR.

kbon commented

+1 on changing this inconsistency. In my opinion there's more than 1 issue at hand here:

  • most of the cs_* code should, as already mentioned, be moved to a new puppetlabs-pacemaker module (crm_*)
  • the puppetlabs-corosync module should remain, but stripped down to only configure the Corosync service. Some users don't have any need to configure Corosync. For example, I'm using Pacemaker on top of CMAN, which itself configures Corosync already.

@antaflos, shout if you need some help.

@kbon, I agree, this should really become puppetlabs-corosync and puppetlabs-pacemaker.

Sorry for the delay in renaming the types and updating the docs, this is taking me longer than expected, mostly thanks to $dayjob. Is it even worthwhile to do this, since this module should really be split up as mentioned above?

What would be a good prefix to use instead of cs_? Maybe pm_ for Pacemaker? Or maybe better, crm_? The types this module provides largely represent the commands of the CRM shell (primitive, group, order, colocation, ...)

I think that's wrong. The CRM shell commands largely represent objects managed by pacemaker. Those objects are represented in a hierarchy that pacemaker makes available as XML through cibadmin(8). Here's an example of a primitive:

<primitive id="Public-IP" class="ocf" type="IPaddr" provider="heartbeat">
  <operations>
     <op id="public-ip-startup" name="monitor" interval="0" timeout="90s"/>
     <op id="public-ip-start" name="start" interval="0" timeout="180s"/>
     <op id="public-ip-stop" name="stop" interval="0" timeout="15min"/>
  </operations>
  <instance_attributes id="params-public-ip">
     <nvpair id="public-ip-addr" name="ip" value="1.2.3.4"/>
  </instance_attributes>
</primitive>

crm is a shell on top of this, which translates between XML and its own syntax. That syntax isn't rigorously specified anywhere, and gets really hairy when using the less obvious features of pacemaker, like resource sets.

True, that the providers implemented in this code currently use crm, but that's a separate issue I'd like to change, for those reasons, and also because crm is not the canonical interface to pacemaker, and the providers have eaten the XML already, anyhow. The things being managed really are pacemaker things (a layer below crm), so I'd opt for the prefix pm_, not crm_.

kbon commented

@bitglue: you're suggestion makes loads of sense, especially now the crm shell isn't included in e.g. RHEL6.4+ anymore it wouldn't be logical to use crm_ as prefix, pm_ looks much better in this aspect. Related to this, it would be a great start for a new Pacemaker module...

@kbon @bitglue @antaflos If you want to start on a large refactor of this module I could probably arrange for other repositories (puppetlabs-pacemaker?), merge access, and automatic releases to the forge. Currently our unit testing is run on travis but acceptance tests (with rspec-system) are run locally on our laptops until we get something larger set up.

Just stumbled into this module and whole-heartedly concur. Pacemaker management is out of scope, should be a separate module.

Much of what hunner is discussing applies now that the module is under puppet-community. http://planck.nibalizer.com:5000/modules/puppet-community/puppet-corosync is even janky CI for it

Concerning crm_ vs. pm_ I disagree, by the way.

Yes, crm is a shell and as such not an integral part of pacemaker. But the Cluster Resource Manager is a thing that exists in any case and crmd is one of the components that make pacemaker tick (or beat?)

A pm prefix would be counter-intuitive because as far as I know, nothing in the LinuxHA ecosystem uses it to refer to pacemaker.

Is this still an issue?

@kbon @bitglue @antaflos @hunner @ffrank

I want to give us the ability to rename this module for v6.0.0.

  • puppet-corosync
  • puppet-pacemaker
  • puppet-cluster
  • puppet-clustering
  • puppet-clusterlabs
  • puppet-ha

Also, what do we want for types?

  • cs_*
  • crm_*
  • cluster_*
  • pm_*
  • pacemaker_*
  • ha_*
kbon commented

Hi @roidelapluie, of course I'd gladly give my bike-shedding input, though I haven't been actively contributing here in a long time anymore ;-)

As for naming in general: in our internal infrastructure we made the historical error of introducing the "ha" term for a wrapper module, which just like "cluster" is awfully generic. In the meanwhile we've introduced Monit, not for clustering but for high availability regardless, and someday yet another HA or clustering tool will come along, doesn't align on all concepts (do they ever), and your module just doesn't make sense anymore: I'd avoid that path.

In that regard, I'd rename this module to puppet-pacemaker, as that's what this thing manages (both installation and configuration). If needed, split out corosync-specific stuff to a puppet-corosync module.
As for type names, @ffrank has a good point, crm_ still fits: even the Pacemaker RPM delivers many crm_ prefixed binaries. An alternative would be cib_ as everything ends up editing the cluster information base, but that could give the wrong impression this module edits the cib directly (who knows, someday). Though "pacemaker" isn't as short as "pm", "pm" makes me think of power management. Additionally the pm abbreviation isn't used anywhere else in the Pacemaker tooling afaik. A worthy alternative could be pcmk.

My vote gets shared between the crm, pcmk or pacemaker prefixes.

@kbon https://github.com/openstack/puppet-pacemaker that modules edits the CIB directly.

I am in favour of keeping puppet-corosync as corosync is always there in any stack. Me might rename to crm_ that might make sense. Like crm_resource instead of cs_primitive

@kbon's comments sound very reasonable to me. A cib_ prefix would be kind of neat, but crm_ is likely the better choice.

Renaming (essentially forking in the classic sense) to puppet-pacemaker could be done, but agree that puppet-corosync is close enough.

current status is: keep puppet-corosync -- move resources from cs_ to crm_

What about changing cs_primitive to crm_resource ?

Hmm, what's the reasoning here? I'm mostly impartial, leaning slightly towards sticking with "primitive".

Well primitive is not in the clusterlabs wording, is it?

@roidelapluie Primitive is indeed in the Clusterlabs wording, at least when using CRM. Primitive resources are things like IP addresses, services, mountpoints, etc that are managed using a corresponding resource agent. Non-primitive resources are things like clone, location, colocation and group. You can see this in http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Clusters_from_Scratch/ which is an older version of Clusters from Scratch that uses CRM, but also in the current edition http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html-single/Clusters_from_Scratch/ (search the page for primitive).

I believe the distinction has been abstracted away now that PCS seems to be the official CRM shell but since this module does not use PCS I think the type should be named crm_primitive, no?

@antaflos this modules uses both pcs and crm depending on the os. You are right, let's keep primitive.