oracle/cluster-api-provider-oci

nodes taints preventing pods from getting schedueled

mouad-eh opened this issue · 4 comments

What happened:
I created a cluster using the vanilla template (cluster-template.yaml). The cluster was created successfully however for any pod I create it stays in a pending state forever.

What you expected to happen:
I expect pods to be scheduled and ready when created.

How to reproduce it (as minimally and precisely as possible):

  • build CAPI image for OCI : cluster-api-ubuntu-2204-v1.28.9
  • create a cluster using the cluster-template.yaml as a template with: 1 controlplane, 1 worker node and v1.29.0 as the kubernetes version.

Anything else we need to know?:
The reason behind pending state is that pods never get schedueled because of the taints present on the nodes.
for the controlplane node, the following taints are present:

node-role.kubernetes.io/control-plane:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

for the worker node, taints are as follows:

node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule

I guess node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule taint should be removed from the nodes once they reach a ready state.

Environment:

  • CAPOCI version: v0.15.1
  • Cluster-API version (use clusterctl version): v1.7.4
  • Kubernetes version (use kubectl version): v1.28.9
  • Docker version (use docker info):
  • OS (e.g. from /etc/os-release): redhat9

Installing CNI using the link provided in the documentation gave me the following error:

error: resource mapping not found for name: "calico-kube-controllers" namespace: "kube-system" from "https://docs.projectcalico.org/v3.21/manifests/calico.yaml": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
ensure CRDs are installed first

I think I am missing some step where I need to install some CRDs but couldn't find that in the documentation.

you may need to use a latest version of calico based on your kubernetes version.

yes, using the latest version of calico solved the issue I was having. It seems that PodDisruptionBudget was deprecated starting from v1.25.0.

For the taints, I was missing the OCI Cloud Controler Manager. Once I installed it, the nodes were initialized and the taint was removed (I only worked with on-prem k8s so this CMM thing was new to me).

Thanks for your help.