gruntwork-io/kubergrunt

Fault tolerance in the eks deploy command

yorinasub17 opened this issue · 0 comments

The eks deploy command should track where it is in the state of the deployment and recover from failures.

For example, it is fairly painful to recoup if the deploy command fails after the ASG was expanded and during the drain call. In that failure mode, the user has to manually drain all the nodes, and then terminate them. We should figure out a way for the deploy command to remember where it failed, and then provide the ability to retry starting from that step.