Question about node taints with regard to doks-managed 'coredns' deployment
Opened this issue ยท 1 comments
Hello ๐
I've a bug report/question about the title:
I've recently created a new cluster on DO, with following Terraform configuration
resource "digitalocean_kubernetes_cluster" "main" {
# ...
node_pool {
# ...
labels = {}
tags = []
taint {
key = "x-resource-kind"
value = "apps"
effect = "NoSchedule"
}
}
}
resource "digitalocean_kubernetes_node_pool" "pool-main-storages" {
# ...
labels = {}
tags = []
taint {
key = "x-resource-kind"
value = "storages"
effect = "NoSchedule"
}
}
Basically I want the new nodes spawned to automatically be given a taint, since I want to control my current/future pods resources for internal usages. The clusters & node pools are created fine, and so is the taint
captain@glados:~$ kubectl describe nodes pool-main-fv5zb
# ...
Taints: x-resource-kind=apps:NoSchedule
# ...
But I noticed that one of the deployments are not running (coredns
)
captain@glados:~$ kubectl get deployment -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
cilium-operator 1/1 1 1 10h
coredns 0/2 2 0 10h
captain@glados:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
cilium-operator-98d97cdf6-phw2j 1/1 Running 0 10h
cilium-plbv2 1/1 Running 0 10h
coredns-575d7877bb-9sxdl 0/1 Pending 0 10h
coredns-575d7877bb-pwjtl 0/1 Pending 0 10h
cpc-bridge-proxy-hl55s 1/1 Running 0 10h
konnectivity-agent-dcgsg 1/1 Running 0 10h
kube-proxy-zfn9p 1/1 Running 0 10h
captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 31m (x118 over 10h) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {x-resource-kind: apps}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Normal NotTriggerScaleUp 2m10s (x431 over 7h16m) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) had untolerated taint {x-resource-kind: apps}
Is this expected? From the logs I understand why it didn't trigger the scale up, it's just that I don't know whether this is the proper behaviour or not.
It's also that other kube-system
pods/deployments are running fine, I think because the tolerations are set up to "always tolerate everything"
captain@glados:~$ kubectl describe pod/cilium-plbv2 -n kube-system
# ...
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
# ...
versus
captain@glados:~$ kubectl describe pod/coredns-575d7877bb-9sxdl -n kube-system
# ...
Tolerations: CriticalAddonsOnly op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
# ...
As per the reference
If this is expected, you can close this issue. If not then the default deployment might need to be adjusted maybe? Though I don't know whether this will affect others
Hey ๐
Most of the managed workloads that are deployed into the data plane are critical Daemonsets that must or should always run to provide core functionality. CoreDNS is in a bit of a mixed state, in the sense that it does provide core functionality but that it must also run on a worker node that is considered healthy (and that it should be moved / evicted to one should its hosting node become unhealthy).
I think we haven't revisited the current tolerations in a while, so there's possibly an opportunity to improve here. That said, I'd be hesitant to give it a blank toleration since, for instance, we wouldn't want CoreDNS to continue to run on a node that's under memory pressure.
There have also been requests to support extending the list of tolerations on CoreDNS by customers to make it better suit custom taints associated to node pools (as you did). This is something we're also considering to do.
With that background shared, I'd be curious to hear what people's preferences are (and why if non-obvious). This would help us plan for the right, next move in this context.