gardener/machine-controller-manager

Drain Timeout is not respected during client side throttling

himanshu-kun opened this issue · 2 comments

How to categorize this issue?

/area robustness
/kind bug
/priority 2

What happened:
A case was seen where drain timeout was set to 2hrs but the drain ended up going till 11hrs.
If drainTimeout is 2hrs, then using podEvictionInterval(20sec) , we calculate maxEvictRetries as 360.
In case of client side thottling , the evict call takes huge time and thus the interval b/w 2 pod eviction requests becomes more than 100sec also.
Since currently we don't have context based cancellation for evictPodWithoutPVInternal , we rely just on maxEvictRetries to exhaust to come out of the loop. This can happen in the case where pod eviction runs into a timeout.
This leads to 11hrs or more of drain, and no force delete of the machine is done.

What you expected to happen:
Drain timeout to be respected in every situtation.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Context based cancellation should be used effectively.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:

@himanshu-kun Label area/todo does not exist.

@elankath Label area/todo does not exist.