post-start should check Node status.Conditions (expecting Status=false)
poblin-orange opened this issue · 0 comments
In order to benefit from bosh canary / max in flight mechanism, the bosh release should check all of k8s node status.conditions @ bosh posts-start.
expected state is Status=false
.
Status=true
should result in post-start failure, thus preventing further impacts on following instance groups
eg: kubectl wait --for=condition=Ready node/agents-concourse-r1-z1-0 --timeout=10s
conditions:
- lastHeartbeatTime: "2023-08-29T17:04:07Z"
lastTransitionTime: "2023-08-29T17:04:07Z"
message: Cilium is running on this node
reason: CiliumIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2023-09-20T15:35:02Z"
lastTransitionTime: "2023-09-09T23:53:50Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
Note that Ready has a negated Status and Ready=true
should be expectec
https://kubernetes.io/docs/reference/node/node-status/#condition
Node Condition Description Ready True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds)
Sample standard node conditions are documented into https://kubernetes.io/docs/reference/node/node-status/#condition
Additional extra node conditions can be set by 3rd party components, such as node-problem-detector see https://kubernetes.io/docs/tasks/debug/debug-cluster/monitor-node-health/#exporter
"condition": "KernelDeadlock",