gardener/machine-controller-manager

[Regression] CrashloopBackoff don't turn to Running quickly

himanshu-kun opened this issue · 1 comments

How to categorize this issue?

/area quality
/kind bug
/priority 3

What happened:

Due to the PR #745 , CrashloopBackoff machines don't turn to Running as soon as the node object becomes Ready.

Details :

When a machine creation fails , the machine turn to Crashloopbackoff. This leads to retrying of reconciliation and triggerCreationFlow is called again in every reconciliation until VM creation succeeds.
In our case, once its succeding, the spec related fields like node label, providerID are updated and the machine is sent for reconciliation again.
In this next reconciliation , it enters reconcileMachineHealth() , but due to node object not yet registered , a LongRetry(10min) is done.

Also when the node object is created, the event is ignored because of this check in place.

This leads to machine being in Crashloopbackoff phase, and it turns to Running only after 10min (given the node object has successfully joined and Ready)

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

@himanshu-kun Label area/todo does not exist.