/multiarch-manager-operator

An operator for managing workload placement in Openshift clusters with compute nodes of varying architectures

Primary LanguageGoApache License 2.0Apache-2.0

Multiarch Manager Operator

The Multiarch Manager Operator (MMO) enhances the user experience for administrators of Openshift clusters with multi-architecture compute nodes or Site Reliability Engineers willing to migrate from single-arch to multi-arch OpenShift. When diverse CPU architectures coexist within a cluster, the MMO operator stands out as a pivotal tool to enhance efficiency and streamline operations such as architecture-aware scheduling of workloads.

The development work is still ongoing and there is no official, general available, release of it yet.

Description

  • Architecture aware Pod Placement: The pod placement operand aims to automate the inspection of the container images, derive a set of architectures supported by a pod and use it to automatically define strong predicates based on the kubernetes.io/arch label in the pod's nodeAffinity. This operand is based on the KEP-3521 and KEP-3838, as described in the Openshift EP introducing it. When a pod is created, the mutating webhook will add the multiarch.openshift.io/scheduling-gate scheduling gate, that will prevent the pod from being scheduled until the controller computes a predicate for the kubernetes.io/arch label, adds it as node affinity requirement to the pod spec and removes the scheduling gate.

Getting Started

The aim of this operator will be to run on any Kubernetes cluster, although the main focus of development and testing will be carried out on Openshift clusters.

Development

Build the operator

# Multi-arch image build
make docker-buildx IMG=<some-registry>/multiarch-manager-operator:tag

# Single arch image build
make docker-build IMG=<some-registry>/multiarch-manager-operator:tag

# Local build
make build

If you aim to use the multi-arch build and would avoid the deletion of the buildx instance, you can create an empty .persistent-buildx file in the root of the repository.

touch .persistent-buildx
make docker-buildx IMG=<some-registry>/multiarch-manager-operator:tag

Deploy the operator

# Deploy the operator on the cluster
make deploy IMG=<some-registry>/multiarch-manager-operator:tag

Deploy the Pod Placement Operand

After the operator is running, you can deploy the Pod Placement Operand on the cluster through the PodPlacementConfig CR. It is expected to be a singleton, cluster-scoped resource, named cluster.

The following is an example of a PodPlacementConfig CR that sets the log verbosity level to Normal and will watch and setup CPU architecture node affinities on all the pods, except the ones in namespaces labeled with multiarch.openshift.io/exclude-pod-placement.

Note: the namespaces openshift-*, kube-*1, and hypershift-* are excluded by default and cannot be included back for safety reasons.

# Deploy the pod placement operand on the cluster
kubectl create -f - <<EOF
apiVersion: multiarch.openshift.io/v1alpha1
kind: PodPlacementConfig
metadata:
  name: cluster
spec:
  logVerbosityLevel: Normal
  namespaceSelector:
    matchExpressions:
      - key: multiarch.openshift.io/exclude-pod-placement
        operator: DoesNotExist
EOF

Undeploy the PodPlacementConfig operand

kubectl delete podplacementconfig/cluster

Ordered uninstallation of the operand will be implemented in the future, and will remove the pod placement controller only after all the scheduling gated pods have been ungated.

To overcome this limitation currently, you can execute the following to ensure the deletion of the scheduling gate from all the pods:

kubectl get pods -A -l multiarch.openshift.io/scheduling-gate=gated -o json  | jq 'del(.items[].spec.schedulingGates[] | select(.name=="multiarch.openshift.io/scheduling-gate"))' | kubectl apply -f -

Uninstall CRDs

To delete the CRDs from the cluster:

make uninstall

Undeploy controller

UnDeploy the controller from the cluster:

make undeploy

Modifying the API definitions

If you are editing the API definitions, generate the manifests such as CRs or CRDs using:

make manifests

NOTE: Run make --help for more information on all potential make targets

More information can be found via the Kubebuilder Documentation

Running tests

The following units are available as Makefile targets:

# Lint
make lint
# gosec (SAST)
make gosec
# vet
make vet
# goimports
make goimports
# gofmt
make fmt
# Run unit tests
make test
# Run e2e tests (after the operator is deployed, e.g., via `make deploy`)
KUBECONFIG=/path/to/cluster/kubeconfig NAMESPACE=openshift-multiarch-manager-operator make e2e 

All the checks run on a containerized environment by default. You can run them locally by setting the NO_DOCKER variable to 1:

NO_DOCKER=1 make test

or adding the NO_DOCKER=1 row in the .env file.

See the dotenv.example file for other available settings.

How it works

See Openshift Enhancement Proposal.

Contributing

// TODO(user): Add detailed information on how you would like others to contribute to this project

License

Copyright 2023 Red Hat, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.