Drain Timeout is not respected during client side throttling
himanshu-kun opened this issue · 2 comments
How to categorize this issue?
/area robustness
/kind bug
/priority 2
What happened:
A case was seen where drain timeout was set to 2hrs but the drain ended up going till 11hrs.
If drainTimeout is 2hrs, then using podEvictionInterval(20sec) , we calculate maxEvictRetries
as 360.
In case of client side thottling , the evict call takes huge time and thus the interval b/w 2 pod eviction requests becomes more than 100sec also.
Since currently we don't have context based cancellation for evictPodWithoutPVInternal
, we rely just on maxEvictRetries
to exhaust to come out of the loop. This can happen in the case where pod eviction runs into a timeout.
This leads to 11hrs or more of drain, and no force delete of the machine is done.
What you expected to happen:
Drain timeout to be respected in every situtation.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Context based cancellation should be used effectively.
Environment:
- Kubernetes version (use
kubectl version
): - Cloud provider or hardware configuration:
- Others:
@himanshu-kun Label area/todo does not exist.
@elankath Label area/todo does not exist.