contiv-experimental/cluster

Proposal: Support Cluster API abstraction

Closed this issue · 5 comments

The Cluster Manager API currently implements node-level functionality, such as commissioning/decommissioning:

https://github.com/contiv/cluster/blob/master/management/src/clusterm/manager/api.go#L19-L23

This proposal suggests implementing top-level API objects required for providing cluster-level operations, such as:

Kind: A string value representing the REST resource this object represents. For example Node, Cluster, etc..
Spec: Specification of the desired behavior of the cluster/node.
Cluster: Cluster is one or more nodes that run in a cluster.
ClusterSpec: A description of a cluster.
Node: A standalone physical/virtual machine with predefined/discovered compute, memory, storage and networking capabilities.
NodeSpec: A description of a node.

Here is an example manifest that can be passed to Cluster Manager API to instantiate a Kubernetes cluster containing 10 nodes:

kind: Cluster
spec:
  type: kubernetes
  count: 10 # specifies the number of contiv/cluster nodes to commission.
...

Here is an example manifest that can be passed to the Cluster Manager API to instantiate a Swarm cluster containing 5 nodes:

kind: Cluster
spec:
  type: swarm
  count: 5 # specifies the number of contiv/cluster nodes to commission.
...

Additional ClusterSpec attributes can be introduced in the future, such as:

spec:
  os: coreos # the operating system used by nodes in the cluster
  target: baremetal # a target for instantiating a cluster ex> baremetal/vagrant/aws/openstack/etc

/cc: @jainvipin @stephenrlouie @adamschaub

@danehans

thanks for putting together the proposal. The idea of higher level API is a good one. Especially being able to define properties/attributes of the constructs like cluster, node ahead of time.

I understand that this proposal suggests a new higher level API. I feel some of it is addressed today. So I thought I would also draw your attention to a few things that exist. And also to some that don't :) but are being worked on and can address some of the asks here.

  • the Global REST endpoint that exists today is meant to allow user to specify any cluster level attributes. For instance, this is where user will typically specify say the type of scheduler stack to use or in future the OS to pxe boot nodes with.
  • Global and Commission endpoints allow passing ansible variables and hence dictate the behavior of the underlying ansible.
    • This exposes pretty rich functionality to the user in terms of controlling the underlying ansible playbooks.
    • but at the same time it can be a bit overwhelming as user might need to know the a lot of variables.
    • this can be debated either ways wrt usability/crudeness and richness of functionality. For now we have taken the path of documenting the common variables, hoping that as long as variables are self explanatory and organized it will help user get the behavior they are looking for. This might not be ideal but works for the initial phase where we try to harden the cluster flows and usecases
  • Dynamic node role assignment (being tracked by #69 ). Cluster members are commissioned and assigned to host-group dynamically. This leads to easier management and scale of cluster. For instance, we can add masters on the fly as the cluster grows.

In addition to above (and slightly unrelated) to the proposal we need a few more low level things hashed out, like :

  • a way to handle provision failures better. ATM I am thinking we can start simple and cleanup on failure as tracked by #85
  • a way to be able to perform operations on multiple nodes. This is being tracked with #86.
    • A proposal to handle that in current cluster manager code might just take adding a nodes/commission endpoint that takes a list of node names. But we need to deal with how to handle partial provision failures. I think we can start simple though and fail (cleanup) everything in such cases.

Based on the above I feel Cluster spec in this proposal gives a structure to the info that is passed as extra-vars to the Global endpoint. While there are pros and cons to both approaches (i.e. a structure v/s just set of variables). But I am preferring variables atm as it keep cluster API simple. For instance, extra-vars gives an easy way to extend attributes, as they are just ansible variables which can be introduced as we add/enables features in ansible playbooks without having to change cluster manager's API much. While Cluster spec ties the features/attributes to the cluster API. The current implementation keeps it simple as we don't have to version cluster APIs when we add new features/attributes to the underlying ansible.

Let me know if this makes sense or in case I missed something basic here.

Thanks for helping better understand globals and how they apply to Ansible extra vars. I agree with keeping the API simple and tune it based on use cases, feedback, etc.. I think globals aligns nicely with concepts such as labels in other systems.

I have reviewed and provided feedback to the other issues you reference.

thanks @danehans

I will close this issue. Let's revisit once we have addressed some of the open items above and we will have a better idea of what else needs to be done.

Sounds like a plan @mapuri