Fix controller.machineStatusUpdate to retry on conflict
elankath opened this issue · 0 comments
How to categorize this issue?
/area quality
/kind bug
/priority 2
What happened:
Machines are stuck in CrashLoopBackoff
and never transitions to Failed
if VM startup takes a very long time and context deadline is exceeeded. This is because the Machine obj gets outdated and controller.machineStatusUpdate
unfortunately doesn’t use retry on conflict so machine status update to the Failed
phase is missed inside controller.machineCreateErrorHandler
What you expected to happen:
Machines are not stuck in CrashLoopBackoff
. Machine Status is updated successfully without frequent errors saying "object has been modified.the object has been modified; please apply your changes to the latest version and try again"
How to reproduce it (as minimally and precisely as possible):
set a sleep exceeding the context deadline inside the MC Driver.CreateMachine
Anything else we need to know?:
related #767
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- Others: