gardener/machine-controller-manager

[Edge case] MCM rollout stuck if rollout + scale-up triggered together

Opened this issue · 1 comments

How to categorize this issue?

/area robustness
/kind bug
/priority 2

What happened:

There is a case where if

  • there are > 1 active machineSets (machineSet with .spec.replicas > 0), and
  • machineDeployment is updated such that it starts referring to a new machineClass , and
  • machineDeployment.spec.replicas is increased (in decrease case issue doesn't happen)

then mcm starts to panic.
Furthermore , if the panic doesn't happen , the rollout will be stuck because scale-up logic is run before rollout logic (where new machineSet creation happens), and so new machineSet is never created.

Need to solve this in two steps:

  • Address the panic situation
  • #814
  • Allow the rollout to proceed

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

A result of the changes introduced in #765

Environment:
mcm v0.49.0

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

t=0 ms1
t=1 ms1 , ms2
t=2 ms1 , ms2 , ms3 (+scale-up)