Determine design approach to handle cluster configuration values

Question

Determine design approach to handle cluster configuration values

Opened this issue 2 years ago · 1 comments

seansund commented 2 years ago

Need information like ingress subdomain, ingress class, storage class(es), etc

Options:

Add ClusterV2 interface with additional configuration options and ClusterConfig module that inputs and outputs values
Create a terraform provider for logging into and getting configuration from Kubernetes and OpenShift clusters
Store cluster information in a config file in the GitOps repo

Answer 1 · 2022-11-09T15:27:52.000Z

Background

To get the toolkit to support additional clusters, be it on different cloud providers, bare metal or generic Kubernetes clusters there are cases where information about the target cluster is needed to allow the automation modules to function correctly. This design is looking at how we can provide the required information in a way to future proof the toolkit that will allow expansion to additional cloud environments/cluster types

It is important that this design doesn’t break the current cluster interface or existing toolkit modules, so the design should ensure backward compatibility

Questions: Should we assume that the user of the Toolkit is going to validate the solution against the target cluster (e.g. is some of the software already installed) and adjust the solution accordingly, or should the solution cope with deploying to a cluster that may already have software installed?

Required Information

The 2 types of modules (terraform vs gitops) have largely the same requirement to know about the target cluster. However, the terraform modules have access to the cluster at apply time so can query the cluster as needed, whereas the gitops modules must not directly access the target cluster, they should only make changes to the gitops git repo. All required information about the target cluster needs to be made available via the proposed solution outlined in this design

Cluster type

The cluster type can be used to determine if the cluster is a generic Kubernetes or has the additional capabilities provided by OpenShift, so propose the following (this should map to the type property of the cluster platform object)

type := kubernetes | openshift

A subtype classification identifies a more specific cluster type (this should map to the type_code property of the cluster platform object). This can be used to identify if a cluster has or doesn’t have a specific capability

type_code := iks | aks | gke | eks | ack | roks | aro | rosa | ocp4 | minikube | kind | ushift | okd | k8s | …

Question: Will this break existing modules that may expect the type_code to only contain kubernetes, ocp3 or ocp4?

Ingress

Kubernetes clusters expose services using an ingress. When creating an Ingress some information about the cluster is needed

ingress_class
ingress_base_domain (this should map to the ingress property of the cluster platform object)

Question: Is a single ingress class enough functionality? IKS defines a public and private ingress. Should we support multiple ingress definitions or just a single one with more complex cases being handled by module parameter overrides?

Storage

Question: When accessing cluster storage a deployment can use the storage marked as the default storage provider or provide a specific storage class to use. Is there a need to provide details of available storage classes to modules or can we assume that the cluster will be set with a valid default and if this is not the desired behaviour then module parameters will be used?
Question: Do we need to provide capability to have multiple storage classes with different volume/access modes?

Operators

Some modules will install software and services using operators. When creating an operator subscription you need to know details of the catalogs, such as namespaces where the catalog is running, channels providing an operator version.

Question: are we assuming the OpenShift default catalogs or the olm community catalogs will be available for the openshift / kubernetes cluster types, or should we provide the capability for a module to validate a catalog is available or if an operator is available on a cluster?

Cluster state

One of the questions at the top of the design asked if modules need to be smart and cope with the state of the cluster. If so, there needs to be a way for a module to validate the state of the systems. This could be by:

checking if a CSV exists
checking if a deployment/pod exists
checking the presence of a namespace (project)

If the amount of data is too large to pass to all modules, then this cluster data could be moved to a terraform provider or a separate Toolkit module/interface so only modules needing this additional data need specify its as a dependency

Question: Is this beyond the scope of the toolkit and the responsibility for checking lies with the Toolkit user or should we provide a mechanism for a module to verify the cluster state?

Cluster specific data

Question: Is there a need to store additional details for a cluster deployed in a specific environment, e.g. Resource group, cloud region, cluster ID. If so should there be a generic mechanism for storing additional data within the cluster configuration data? This could be an object that can contain any key value pairs, such as:

additional := Object with free form key value pairs that can hold any additional data needed by a specific deployment or cluster type. The type_code should allow modules to know what keys will be available in the ‘additional’ property

Implementation

Could this be implemented in a Kubernetes provider? There are the Hashicorp Kubernetes and Helm providers as well as providers for specific cloud providers, however we would need to write a provider to be cross OpenShift and Kubernetes on various different deployment options. We also need to then look at the option to add additional deployment targets (private and cloud providers).

The Toolkit module approach with the interface facility provides a mechanism to support existing cloud providers and cluster types with a clear path to allow additional cluster types and deployment options to be added.

Creating new interfaces would allow the cluster information from the above section to be implemented. If an interface property exists in the cluster interface, then it should retain the name from the cluster interface to support backwards compatibility:

kube_config := the interface for terraform modules that need to be able to login to the cluster (the location of a Kubernetes config file)
cluster_config := the interface holding the cluster state information to be used by both terraform and gitops modules

Question: When creating a new cluster the data to populate the interface is readily available, but if the Toolkit will be used to revisit a previously created cluster, then should we regenerate the data (which may not be entirely possible) or should the data be stored on the cluster in a ConfigMap?
Question: How should the data be made available to GitOps modules, where there is no access to the target cluster other than via the GitOps repo?

Modules that create a cluster instance

These modules will create a new cluster, similar to the ibm-ocp-vpc and ibm-iks-vpc modules. Each module will target a specific cluster type and deployment environment. The module should create the cluster and implement both the kube_config and cluster_config interfaces, in addition to the existing cluster interface for backward compatibility. These interfaces will make the configuration data available to all subsequent modules in the BOM or solution BOM

Terraform modules using an existing cluster

There are times when the Toolkit needs to use an existing cluster rather than deploy a cluster from scratch. There will be a cluster-login module to support ‘generic’ kubernetes mechanisms for logging onto a cluster (kube config, user certificates, username and password/token). If a cluster is on a cloud provider that has a bespoke cluster login mechanism or has an identity provider configured that doesn’t support the generic Kubernetes login mechanisms, then a specific login module should be created to handle the cluster login.

The cluster login modules should implement both the kube_config and cluster_config interfaces, in addition to the existing cluster interface for backward compatibility. These interfaces will make the configuration data available to all subsequent modules in the BOM or solution BOM. There may be some properties, such as the type_code or ingress_base_domain that cannot be determined from the cluster in a standard way. One way round this is to store the cluster data in a ConfigMap that is read as part of the cluster login functionality. If the cluster was created using the Toolkit, then the module that created the cluster should populate the ConfigMap.

If the cluster wasn’t created using the Toolkit then module parameters will be required to provide the missing data at first login, the login module should add/update the ConfigMap

If the cluster property end up including cluster state information, such as available catalogs, operators or details about namespaces, deployments, then this information should be updated by the login modules, to ensure stale data is not provided to subsequent modules in the BOM

GitOps modules using an existing cluster

When creating a BOM or solution BOM targeting an existing cluster, then there may be no access to the cluster other than via the gitops repo. For this case we need to find a way of making the cluster data available. One option would be to store the cluster data within the gitops repo. Then create a gitops-login module to read the cluster data and make it available via the cluster_config interface to subsequent modules in a BOM or solution BOM.

To get the cluster data into the gitops repo would require a modules, such as the gitops-bootstrap module to have previously populated the data in the repo. If there its cluster data that may become stale then the gitops-login module could initiate a job via the gitops repo and ArgoCD to update the cluster data before reading the data from the git repo

Summary

The design for the provision of cluster information to modules should:

allow additional deployments environments (cloud providers) and cluster types to be easily added to the Toolkit
provide a standard mechanism to find information about the target cluster that works for both terraform and gitops modules