NNCP failing if a node has a transitory not ready state.

Question

NNCP failing if a node has a transitory not ready state.

Closed this issue 3 years ago · 4 comments

What happened:
Applying a NNCP on cluster with unexpected non ready node that is going to be ready (for example in the middle of an upgrade) make the NNCP fail.

What you expected to happen:
kubernetes-nmstate should apply the NNCP at the nodes that are ready and wait to apply it to the nodes that are not ready until they are.

How to reproduce it (as minimally and precisely as possible):
Put some node down and apply a NNCP

Anything else we need to know?:
Is true that playing with NodeSelector one can skip those non ready nodes but that's only good for permanent non ready nodes scenarios also affinity would be a better fit for that insted of nodes selector. but for intermintent readiness state NNCP has to be able to reapply the NNCP when the nodes are ready again.

Environment:

NodeNetworkState on affected nodes (use kubectl get nodenetworkstate <node_name> -o yaml):
Problematic NodeNetworkConfigurationPolicy:
kubernetes-nmstate image (use kubectl get pods --all-namespaces -l app=kubernetes-nmstate -o jsonpath='{.items[0].spec.containers[0].image}'):
NetworkManager version (use nmcli --version)
Kubernetes version (use kubectl version):
OS (e.g. from /etc/os-release):
Others:

Answer 1 · 2021-10-25T14:06:52.000Z

@qinqon Would it make sense to add a new NNCE state NodeNotReady, and consider policy Available even when some NNCEs are in this NodeNotReady state?
If so, should NNCEs retry periodically when in NodeNotReady state?

Answer 2 · 2021-10-25T18:56:17.000Z

Alternative is using the Pending state, and marking the Policy as Available only when all the matching nodes actually get configured. Now when I think about this, I think this is a better way to go if the usual scenario is that nodes are only temporarily NotReady. Having long-term NotReady nodes would still cause the policy to fail.

Answer 3 · 2022-02-08T09:19:57.000Z

Fixed via #981

Answer 4 · 2022-02-08T09:20:16.000Z

/assign @rhrazdil