aws/karpenter-provider-aws

SpotToSpotConsolidation is trying to spin up a node which is then not available

woehrl01 opened this issue · 3 comments

Description

Observed Behavior:

Having SpotToSpotConsolidation: true. We see that it suggests that a smaller spot instance is available, it then starts to consolidate the node, but the smaller spot instance type is not available, failing to create the node. It then creates a bigger instances type again, resulting in a loop.
Bildschirmfoto 2024-09-04 um 09 29 19

Bildschirmfoto 2024-09-12 um 08 05 42

Bildschirmfoto 2024-09-12 um 08 18 02

Expected Behavior:

Should only consolidate if the instances sizes are actually available.

Reproduction Steps (Please include YAML):

Versions:

  • Chart Version: 1.0.0
  • Kubernetes Version (kubectl version): 1.30
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

but the smaller spot instance type is not available, failing to create the node

Can you share the logs from this operation? We definitely shouldn't be deleting the old node before the replacement successfully comes up. Are you seeing the old node getting torn down when you are seeing the bigger instance type get launched or are you seeing us launch double capacity for the same pods?

Please see here some related logs

Bildschirmfoto 2024-09-17 um 16 12 26

I noticed that it didn't schedule any pods on the node which should be torn down, as it was marked for deletion, effectively preventing any scheduling of new pods.

This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity.