gardener/machine-controller-manager

Fix controller.machineStatusUpdate to retry on conflict

elankath opened this issue · 0 comments

How to categorize this issue?

/area quality
/kind bug
/priority 2

What happened:
Machines are stuck in CrashLoopBackoff and never transitions to Failed if VM startup takes a very long time and context deadline is exceeeded. This is because the Machine obj gets outdated and controller.machineStatusUpdate unfortunately doesn’t use retry on conflict so machine status update to the Failed phase is missed inside controller.machineCreateErrorHandler

What you expected to happen:
Machines are not stuck in CrashLoopBackoff. Machine Status is updated successfully without frequent errors saying "object has been modified.the object has been modified; please apply your changes to the latest version and try again"

How to reproduce it (as minimally and precisely as possible):
set a sleep exceeding the context deadline inside the MC Driver.CreateMachine

Anything else we need to know?:

related #767
Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others: