cortexlabs/cortex

Ensure the cluster autoscaler has the highest pod priority

RobertLucian opened this issue · 0 comments

Description

Required to ensure that if a pod gets evicted due to low resource availability on the node, the cluster autoscaler can provision another node for the operator node group. When the node's kubelet notices the low resource availability (memory), it taints the node with the node.kubernetes.io/memory-pressure:NoSchedule taint, which will effectively trigger the autoscaler to add another node for the other pending pods.

Whereas if it doesn't have the highest priority, there's a chance of the cluster autoscaler pod to get evicted, which will leave the cluster in a permanently broken state - it is possible that most of the pods won't be able to start on the nodes of the operator node group because they will all be demanding more memory because:

a). They would all use more memory on startup than usual (given the node's already dwindling resources)
b). The pods may be requiring more memory due to the cluster's big size.