Fault tolerance in the eks deploy command
yorinasub17 opened this issue · 0 comments
yorinasub17 commented
The eks deploy command should track where it is in the state of the deployment and recover from failures.
For example, it is fairly painful to recoup if the deploy
command fails after the ASG was expanded and during the drain call. In that failure mode, the user has to manually drain all the nodes, and then terminate them. We should figure out a way for the deploy
command to remember where it failed, and then provide the ability to retry starting from that step.