k3s-io/helm-controller

Using Helm controller to initialise cilium

PurpleBooth opened this issue ยท 10 comments

We're using the Helm Controller to initialise cilium, and we'd like to use node.cilium.io/agent-not-ready to prevent scheduling on the nodes before cilium is ready using initial taints. However this prevents the bootstrap: true flag from working. What is the best way to achieve this?

We'd like to avoid having to restart any unmanaged containers. If needed I could try submitting a PR allowing the customisation of the taint tolerations, but I don't want do that if it's not a direction you'd like to go in.

(Kinda hoping there's a cool feature I have missed to avoid this all together ๐Ÿ˜„ )

@brandond Any ideas on the above? Our Cilium deployment on Kube-Hetzner is not working correctly, and we need to know what to do ASAP, please. Guidance would be genuinely appreciated ๐Ÿ™

@PurpleBooth there isn't a good way to add arbitrary tolerations to the controller at the moment, no. Is the NotReady state on the kubelet that's present until the CNI comes up not sufficient for what you're trying to do? You need to inject another taint that mirrors the kubelet's CNI deployment status?

@mysticaltech your ask seems to be unrelated do what @PurpleBooth is trying to accomplish; I'm not sure what kube-hetzner has to do with k3s or our helm controller.

@brandond We use both! Thanks for the details. Will let @PurpleBooth answer her part.

I ran into what I believe is a variant of the same issue today, but in a different situation. I made a mistake while deploying the configuration for the rke2-cilium chart which took down Cillium agent in my single node, but now the helm-install pod cannot run due to the agent-not-ready taint even after I corrected the chart configuration.

It would be really helpful to manage the tolerations for the controller and the pods it generates, and ship sensible defaults in K3S/RKE2 stop avoid such "lock out" situations.

now the helm-install pod cannot run due to the agent-not-ready taint

The CNI HelmCharts are bootstrap charts, which run with host network and tolerate most things, including the NotReady taint. Are you sure this is the root cause of the failure to recover?

if chart.Spec.Bootstrap {
job.Spec.Template.Spec.NodeSelector[LabelNodeRolePrefix+LabelControlPlaneSuffix] = "true"
job.Spec.Template.Spec.HostNetwork = true
job.Spec.Template.Spec.Tolerations = []corev1.Toleration{
{
Key: corev1.TaintNodeNotReady,
Effect: corev1.TaintEffectNoSchedule,
},
{
Key: TaintExternalCloudProvider,
Operator: corev1.TolerationOpEqual,
Value: "true",
Effect: corev1.TaintEffectNoSchedule,
},
{
Key: "CriticalAddonsOnly",
Operator: corev1.TolerationOpExists,
},
{
Key: LabelNodeRolePrefix + LabelEtcdSuffix,
Operator: corev1.TolerationOpExists,
Effect: corev1.TaintEffectNoExecute,
},
{
Key: LabelNodeRolePrefix + LabelControlPlaneSuffix,
Operator: corev1.TolerationOpExists,
Effect: corev1.TaintEffectNoSchedule,
},
}

@brandond I believe so, I see a pending helm install pod like such:

$ kubectl --kubeconfig ~/.kube/rke2_config get event --namespace kube-system --field-selector involvedObject.name=helm-install-rke2-cilium-zwdfb
LAST SEEN   TYPE      REASON             OBJECT                               MESSAGE
28m         Warning   FailedScheduling   pod/helm-install-rke2-cilium-zwdfb   0/1 nodes are available: 1 node(s) had untolerated taint {node.cilium.io/agent-not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

I took a stab at adding support for customizing tolerations: #221

Oh right, I'd forgotten about the original topic of this issue. Why does cilium add a custom node.cilium.io/agent-not-ready taint, even if you didn't ask for one? That's dumb. The existing kubelet not-ready taint should cover the CNI not being initialized; I guess they feel like they need some extra logic to make sure they're up before even hostnetwork pods.

Looks like https://docs.cilium.io/en/stable/installation/taints/ covers their thinking, but I don't agree that its useful given how RKE2 deploys cilium. I would recommend turning this off with the --set-cilium-node-taints=false operator option. You can set this via operator.extraArgs in the helm chart.

https://github.com/rancher/rke2-charts/blob/main/charts/rke2-cilium/rke2-cilium/1.9.809/charts/cilium/values.yaml#L1121

cc @thomasferrandiz @rbrtbnfgl in case y'all have notes on how to best pass this through to the cilium subchart, and thoughts on whether or not this is something we should disable by default - given its propensity to break everything else in the cluster if the operator does add this taint.

The other option is to go around and add cilium taint tolerations to everything that we need to run when the CNI is not up, but that sounds like a lot of work to support a questionable decision on the part of the CNI maintainer.

@brandond thanks for the recommendation, I will try to add the operator arguments and see if it solves my problem.

@brandond I agree, it would make sense to disable cilium's taint by default. We can change that in the next cilium update PR.