hashicorp/terraform-aws-nomad

Roll out updates

Xopherus opened this issue · 3 comments

My team recently automated the roll out updates process. Essentially it was accomplished with two pieces:

  1. A lambda function which is triggered by a Lifecycle Hook when an auto-scaling group wants to terminate a node. This lambda function gets the instance which is being terminated and uses SSM to drain the nomad client of jobs. When it's completely drained, it completes the lifecycle hook to fully terminate the instance.

  2. A script which orchestrates the deployment. It takes advantage of the autoscaling API (instead of EC2) so that it triggers the lifecycle hook to safely drain the nomad client before termination. In our case we chose to scale-out first so that we always have N clients available, rather than N-1 if you scale-in.

I just wanted to open this issue to see if anyone else has feedback on this approach and whether or not you'd like this to be added to that module!

A lambda function which is triggered by a Lifecycle Hook when an auto-scaling group wants to terminate a node.

This seems like a good approach. I've hit some issues in the past with Terraform + lifecycle hooks not firing when you'd expect them to, but it's possible those have been resolved.

This lambda function gets the instance which is being terminated and uses SSM to drain the nomad client of jobs

Why SSM? Why not just call nomad node-drain -address=<IP_OF_TERMINATING_INSTANCE> directly from the Lambda function?

A script which orchestrates the deployment. It takes advantage of the autoscaling API (instead of EC2) so that it triggers the lifecycle hook to safely drain the nomad client before termination. In our case we chose to scale-out first so that we always have N clients available, rather than N-1 if you scale-in.

Can you describe a bit more what the script is doing?

whether or not you'd like this to be added to that module!

Yes please!

@Xopherus the automated roll out process sounds very useful for folks running Nomad in AWS. Are you still open to sharing the script which orchestrates the deployment? Specifically, I am curious if you have tied this back to the Nomad Terraform module somehow, or if it is completely separate.

@sarkis yea definitely. I think what I have is probably a separate tool - internally we're adapting the overall approach to work with k8s, and will probably do the same with consul.