How to drain nodes when AZ Rebalance event is received
sudo0x opened this issue · 6 comments
I want to run NTH in queue processor mode, but there is confusion about how to enable draining of a node when AZ rebalance event is received. We are using self-managed ASGs. If we run NTH in queue processor mode, to enable draining due to AZ rebalance events do I need to set enableRebalanceDraining
and enableRebalanceMonitoring
to true
? However, correct me if I'm wrong, these two flags are not related to AZ rebalance but related to SPOT Interruption events? Moreover, these two flags are only available in IMDS mode?
If yes, then how can I enable cordon and draining when AZ rebalance event is received?
Hi sudo0x, cordoning and draining in response to AZ Rebalance Recommendation events is enabled by default when running in Queue Processor mode.
So I don't need to enable enableRebalanceDraining
and enableRebalanceMonitoring
flags?
Also, I have two EKS clusters and have setup SQS queue for each EKS clusters, how can I send events to their respective queues only via event bridge rule? Is there a cluster name key in event json?
@cjerad does rebalance recommendation is also enabled by default when running in Queue Processor mode?
I have two EKS clusters and have setup SQS queue for each EKS clusters, how can I send events to their respective queues only via event bridge rule? Is there a cluster name key in event json?
The event does not contain any reference to the EKS cluster; however, it does identify the source ASG (name and ARN) and the lifecycle hook name [1] which are usable in an EventBridge rule.
does rebalance recommendation is also enabled by default when running in Queue Processor mode?
Yes, by default Rebalance Recommendations are enabled when running in Queue Processor mode.
If my understanding is correct then if I remove the rule for Rebalance Recommendations in EventBridge, then NTH in Queue Processor mode will not be able to handle those events, right?
NTH only takes action (e.g. cordon and drain a node) when a notification is received, so if no notification is received then no action will be taken. This would only prevent the cordoning and draining of the node, it will not prevent the termination of the node since that is handled by the ASG.