node role/host-group management
mapuri opened this issue · 6 comments
At present cluster-mgr assigns the first node to be commissioned service-master
host group and the subsequent nodes are configured as part of service-worker
This is obviously static and was done to try out the workflows. There is a need for a more user-controlled and/or dynamic host-group management for nodes.
Here are a few considerations to address this issue:
- Use cloud-init, config-drive or ignition to apply a piece of metadata (i.e. tag) to the node during the bootstrap process.
- Use Collins built-in or user defined attributes to dynamically place bootstrapped nodes into an Ansible roles based on the cloud-init tag.
Feedback?
thanks @danehans
I think there are two parts to this. One is to provide a notion of node's role/host-group in cluster-manager API. And second to integrate/invoke the API from external systems to make this process more dynamic.
@vvb is working on the former as part of #87 (it is still undergoing some changes based on discussion there, so feel free to chime in there). I think what you mention is the latter. I don't have much experience with the system like cloud-init, ignition etc but it will be good to get some early input on the integration with these in case any changes will be needed to the cluster API.
@mapuri you mention node's role/host-group, but I believe it's simply the host-group that we care about. If hosts from the inventory are placed in the proper host-group, then the appropriate Ansible roles should be applied to the node. Am I missing something in this workflow?
What about using a concept such as k8s labels/label-selectors? One or mode labels can be applied to a node during bootstrap (using Ignition, cloud-init, etc.). A policy type: provisioning
can use a label-selector to link nodes with a provisioning policy that specifies tasks such as:
- How to provision, manual, auto, etc.
- The host-group(s) that are associated to the policy.
- etc..
The label gets stored in cluster-manager as metadata of the node and passed to Collins as an asset tag when requesting an asset create/update/delete.
Thoughts?
/cc: @vvb
you mention node's role/host-group, but I believe it's simply the host-group that we care about.
If hosts from the inventory are placed in the proper host-group, then the appropriate Ansible roles should be applied to the node. Am I missing something in this workflow?
Yes, ultimately node's role translates in to ansible host-group. But what I also meant was that node's role is a first order primitive today i.e. cluster-manager associates each node to appropriate host-group based on node's role (that can be specified by the user or derived automatically later). And cluster-manager provides some functionality around node roles as well, like it checks when a node is being decommissioned that all workers are brought down before masters as otherwise it can leave cluster in a bad shape.
In addition, Cluster-manager prescribes/supports a node to belong to one of two roles viz. master and worker. This is mostly driven by the way distributed infrastructure services are deployed i.e. they have master components and worker/agent components which can be easily placed by assigning a node to one of the two roles. I agree that this may be a bit too prescriptive or opinionated but it has a few advantages:
- it prescribes a structure to the underlying ansible and keeps the number of combinations of ansible plays that can be run, more controlled and hence testable. We plan to maintain, bundle, and ship this ansible along side cluster-manager as a way to deploy services.
- doing the above helps taking out some of dynamic logic from ansible (which is more suited for dealing with stateless provisioning) and moves it to cluster-manager. For instance, how to incrementally provision new nodes and add them to an existing cluster; or how to bootstrap the first node in a cluster etc are some such scenario.
What about using a concept such as k8s labels/label-selectors?
...
The label gets stored in cluster-manager as metadata of the node and passed to Collins as an asset tag when requesting an asset create/update/delete.
I agree that a concept like labels can provide the flexibility of specifying varied host-groups, but I am wondering if doing this moves complexity of having to figure node's role and running appropriate ansible from cluster-manager to the user (and underlying ansible). While ansible is great at provisioning things but being server-less also makes it stateless which usually makes it less adept for situations where state would help.
Having said that the node labels (specified as part of cloud-init or ignition config) in general can still be useful. For instance, by setting them as part of extra-vars
we could control things in the underlying ansible. We could specify node capabilities i.e. whether node has attached storage or not; or if node has special NICs etc and let services be configured accordingly. But this might be a separate discussion.
Does this help?