SpotToSpotConsolidation is trying to spin up a node which is then not available

Question

SpotToSpotConsolidation is trying to spin up a node which is then not available

woehrl01 opened this issue 7 months ago · 3 comments

Description

Observed Behavior:

Having SpotToSpotConsolidation: true. We see that it suggests that a smaller spot instance is available, it then starts to consolidate the node, but the smaller spot instance type is not available, failing to create the node. It then creates a bigger instances type again, resulting in a loop.

Expected Behavior:

Should only consolidate if the instances sizes are actually available.

Reproduction Steps (Please include YAML):

Versions:

Chart Version: 1.0.0
Kubernetes Version (kubectl version): 1.30

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Answer 1 · 2024-09-16T20:38:30.000Z

but the smaller spot instance type is not available, failing to create the node

Can you share the logs from this operation? We definitely shouldn't be deleting the old node before the replacement successfully comes up. Are you seeing the old node getting torn down when you are seeing the bigger instance type get launched or are you seeing us launch double capacity for the same pods?

Answer 2 · 2024-09-17T14:14:56.000Z

Please see here some related logs

I noticed that it didn't schedule any pods on the node which should be torn down, as it was marked for deletion, effectively preventing any scheduling of new pods.

Answer 3 · 2024-12-10T22:51:02.000Z

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.