Proposal: multi-node operation support
Closed this issue · 3 comments
Addresses #86
Right now clusterm handles requests and events for just one node at a time. This proposal details requirements and changes for adding multi-node operation support.
Desired Behavior
There are two situations where multi node support needs to be provided viz. user requests and monitor events
- user requests: these are REST endpoints that a clusterm's client uses to perform node level operations. Following operations/endpoints are required:
- info/nodes:
- This endpoint exists today and returns the info of all the nodes in the inventory.
- This endpoint shall be extended to take names of one or more nodes and return info of the specified a subset.
- A node that doesn't exist is silently ignored.
- commission/nodes, decommission/nodes, maintenance/nodes:
- these are new endpoints that shall commission, decommission or upgrade , respectively, a subset of nodes
- these endpoints shall take a list of node names
- if one or more nodes doesn't exists the entire request shall fail
- the specified operation is performed on all nodes together, with following behavior on failure:
- commission: tries to cleanup all nodes in case of provision failure. User can then rectify the failure and re-post the same request.
- decommission: cleanup never fails and is best effort
- maintenance: a correct desired behavior is yet to be defined. In current implementation the nodes are transitioned to
unallocated
state but no other action is taken.
- discover/nodes:
- this is a new endpoint that provisions a set of nodes for discovery
- this endpoint shall take a list of node addresses
- if provisioning fails for one or more nodes, then user will need to correct the failure and post this request on failed nodes again
- info/nodes:
- monitor events: these are the node level events generated by the monitor subsystem
- discovered: this updates the inventory state of a node. And no provision action is associated with this event yet.
- this event shall be changed to accept a list of nodes
- disappeared: this updates the inventory state of a node. And no provision action is associated with this event yet.
- this event shall be changed to accept a list of nodes
- discovered: this updates the inventory state of a node. And no provision action is associated with this event yet.
UX considerations
- REST API:
- GET info/nodes
- request body: { nodes: [] }
- empty list shall return info about all nodes
- response body: no change
- response codes: no change
- request body: { nodes: [] }
- POST commission/nodes
- request body: { nodes: [] }
- response body: empty
- response codes:
- 200:
- no errors and ansible run started successfully
- 500:
- backend validation failure, OR
- empty list or one or more non existent nodes shall return http error 500
- 200:
- POST decommission/nodes
- request body: { nodes: [] }
- response body: empty
- response codes:
- 200:
- no errors and ansible run started successfully
- 500:
- backend validation failure, OR
- empty list or one or more non existent nodes were specified
- 200:
- POST maintenance/nodes
- request body: { nodes: [] }
- response body: empty
- response codes:
- 200:
- no errors and ansible run started successfully
- 500:
- backend validation failure, OR
- empty list or one or more non existent nodes were specified
- 200:
- POST discover/nodes
- request body: { addressess: [] }
- empty list or one or more invalid addresses were specified
- response body: empty
- response codes:
- 200:
- no errors and ansible run started successfully
- 500:
- backend validation failure, OR
- empty list or one or more invalid addresses were specified
- 200:
- request body: { addressess: [] }
- GET info/nodes
- CLI:
- get nodes info:
- clusterctl nodes get [< node-name(s) >]
- commission nodes:
- clusterctl nodes commission [--extra-vars=< extra-vars >] < node-name(s) >
- decommission nodes:
- clusterctl nodes decommission [--extra-vars=< extra-vars >] < node-name(s) >
- upgrade nodes:
- clusterctl nodes maintenance [--extra-vars=< extra-vars >] < node-name(s) >
- get nodes info:
System test considerations
- test the success scenario with multiple nodes
- test the 500 error condition paths
- test cleanup on failure of one or more nodes
/cc @vvb for proposal review
/cc @vishal-j for UX
What do you think about passing ansible variables as json in request body for commissioning and discovery?
hmmm, yeah I think we can add them to the body as well.
Till now I had kept them as query-parameters as they are optional i.e. empty value of extra-vars
still implies a valid value while value of extra-vars
being absent implies don't do anything. extra-vars
are merged (i.e. global level and per request level) so this could make a subtle difference in logic implementation which I need to re-verify. And till now we didn't had a precedent for request body.
I need to think more but most likely we can get the current behavior with extra-vars
in request-body as well. This will keep the requests consistent, which is good.
Since this will also affect existing single node APIs as let me track this as a separate issue so we can bring this change in for all APIs together. Does this works?
Yes. Thanks.