/aws-node-termination-handler

A Kubernetes Daemonset to gracefully handle EC2 instance shutdown

Primary LanguageGoApache License 2.0Apache-2.0

AWS Node Termination Handler

A Kubernetes Daemonset to gracefully handle EC2 instance shutdown

kubernetes go-version license go-report-card build-status build-status docker-pulls


Project Summary

This project ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable, such as EC2 maintenance events and EC2 Spot interruptions. If not handled, your application code may not have enough time to stop gracefully, take longer to recover full availability, or accidentally schedule work to nodes that are going down. This handler will run a small pod on each host to perform monitoring and react accordingly. When we detect an instance is going down, we use the Kubernetes API to cordon the node to ensure no new work is scheduled there, then drain it, removing any existing work.

The termination handler watches the instance metadata service to determine when to make requests to the Kubernetes API to mark the node as non-schedulable. If the maintenance event is a reboot, we also apply a custom label to the node so when it restarts we remove the cordon.

You can run the termination handler on any Kubernetes cluster running on AWS, including self-managed clusters and those created with Amazon Elastic Kubernetes Service.

Major Features

  • Monitors EC2 Metadata for Scheduled Maintenance Events
  • Monitors EC2 Metadata for Spot Instance Termination Notifications
  • Helm installation and event configuration support
  • Webhook feature to send shutdown or restart notification messages
  • Unit & Integration Tests

Installation and Configuration

The termination handler installs into your cluster a ServiceAccount, ClusterRole, ClusterRoleBinding, and a DaemonSet. All four of these Kubernetes constructs are required for the termination handler to run properly.

Kubectl Apply

You can use kubectl to directly add all of the above resources with the default configuration into your cluster.

kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.7.0/all-resources.yaml

For a full list of releases and associated artifacts see our releases page.

Helm

The easiest way to configure the various options of the termination handler is via helm. The chart for this project is hosted in the eks-charts repository.

To get started you need to add the eks-charts repo to helm

helm repo add eks https://aws.github.io/eks-charts

Once that is complete you can install the termination handler. We've provided some sample setup options below.

Zero Config:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  eks/aws-node-termination-handler

Enabling Features:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining="true" \
  --set enableScheduledEventDraining="false" \
  eks/aws-node-termination-handler

Running Only On Specific Nodes:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set nodeSelector.lifecycle=spot \
  eks/aws-node-termination-handler

Webhook Configuration:

helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set webhookURL=https://hooks.slack.com/services/YOUR/SLACK/URL \
  eks/aws-node-termination-handler

Alternatively, pass Webhook URL as a Secret:

WEBHOOKURL_LITERAL="webhookurl=https://hooks.slack.com/services/YOUR/SLACK/URL"

kubectl create secret -n kube-system generic webhooksecret --from-literal=$WEBHOOKURL_LITERAL
helm upgrade --install aws-node-termination-handler \
  --namespace kube-system \
  --set webhookURLSecretName=webhooksecret \
  eks/aws-node-termination-handler

For a full list of configuration options see our Helm readme.

Use with Kiam

To use the termination handler alongside Kiam requires some extra configuration on Kiam's end. By default Kiam will block all access to the metadata address, so you need to make sure it passes through the requests the termination handler relies on.

To add a whitelist configuration, use the following fields in the Kiam Helm chart values:

agent.whiteListRouteRegexp: '^\/latest\/meta-data\/(spot\/instance-action|events\/maintenance\/scheduled|instance-(id|type)|public-(hostname|ipv4)|local-(hostname|ipv4)|placement\/availability-zone)$'

Or just pass it as an argument to the kiam agents:

kiam agent --whitelist-route-regexp='^\/latest\/meta-data\/(spot\/instance-action|events\/maintenance\/scheduled|instance-(id|type)|public-(hostname|ipv4)|local-(hostname|ipv4)|placement\/availability-zone)$'

Metadata endpoints

The termination handler relies on the following metadata endpoints to function properly:

/latest/meta-data/spot/instance-action
/latest/meta-data/events/maintenance/scheduled
/latest/meta-data/instance-id
/latest/meta-data/instance-type
/latest/meta-data/public-hostname
/latest/meta-data/public-ipv4
/latest/meta-data/local-hostname
/latest/meta-data/local-ipv4
/latest/meta-data/placement/availability-zone

Building

For build instructions please consult BUILD.md.

Communication

Contributing

Contributions are welcome! Please read our guidelines and our Code of Conduct

License

This project is licensed under the Apache-2.0 License.