/spot-termination-exporter

Prometheus spot instance exporter to monitor AWS instance termination with Hollowtrees

Primary LanguageGoApache License 2.0Apache-2.0

Spot instance termination exporter

Prometheus exporters are used to export metrics from third-party systems as Prometheus metrics - this is an exporter to scrape for AWS spot price termination notice and rebalance recommendations.

Status Of This Repository

This repository is a maintained fork of banzaicloud/spot-termination-exporter with a small number of changes due to the lack of activity in the upstream:

  1. The addition of instance_type labels to metrics relating to instance termination and rebalance recommendations to allow for analysis of metrics by instance type
  2. The addition of a metric for rebalance recommendation events from the metadata service
  3. Moved from dep to go modules and updated the version of go and base docker images

Images for this fork are published to Github's container registry, and are available under ghcr.io/gjtempleton/spot-termination-exporter.

Spot instance lifecycle

  • User submits a bid to run a desired number of EC2 instances of a particular type. The bid includes the price that the user is willing to pay to use the instance for an hour.
  • If the bid price exceeds the current spot price (that is determined by AWS based on current supply and demand) the instances are started.
  • If the current spot price rises above the bid price or there is no available capacity, the spot instance is interrupted and reclaimed by AWS. 2 minutes before the interruption the internal metadata endpoint on the instance is updated with the termination info.
  • If the instance is interrupted the action taken by AWS varies depending on the interruption behaviour (start, stop or hibernate) and the request type (one-time or persistent). These can be configured when requesting the instance. See more about this here

Spot instance termination notice

The Termination Notice is accessible to code running on the instance via the instance’s metadata at http://169.254.169.254/latest/meta-data/spot/termination-time. This field becomes available when the instance has been marked for termination and will contain the time when a shutdown signal will be sent to the instance’s operating system. At that time, the Spot Instance Request’s bid status will be set to marked-for-termination. The bid status is accessible via the DescribeSpotInstanceRequests API for use by programs that manage Spot bids and instances.

Spot instance rebalance recommendations

Rebalance recommendations are advance notice that a given spot instance is at elevated risk of spot disruption, they can either be accessed via AWS EventBridge or via the instance metadata endpoint. A number of AWS tools automatically handle rebalance recommendations, for instance EKS managed node groups.

Quick start

The project uses the promu Prometheus utility tool. To build the exporter promu needs to be installed. To install promu and build the exporter:

go get github.com/prometheus/promu
promu build

The following options can be configured when starting the exporter:

./spot-termination-exporter --help
Usage of ./spot-termintation-exporter:
  -bind-addr string
        bind address for the metrics server (default ":9189")
  -log-level string
        log level (default "info")
  -metadata-endpoint string
        metadata endpoint to query (default "http://169.254.169.254/latest/meta-data/")
  -metrics-path string
        path to metrics endpoint (default "/metrics")

Test locally

The AWS instance metadata is available at http://169.254.169.254/latest/meta-data/. By default this is the endpoint that is being queried by the exporter but it is quite hard to reproduce a termination notice or rebalance recommendation on an AWS instance for testing, so the meta-data endpoint can be changed in the configuration. There is a test server in the utils directory that can be used to mock the behavior of the metadata endpoint. It listens on port 9092 and provides dummy responses for /instance-id, /spot/instance-action, instance-type, and events/recommendations/rebalance. It can be started with:

go run util/test_server.go

The exporter can be started with this configuration to query this endpoint locally:

./spot-termination-exporter --metadata-endpoint http://localhost:9092/latest/meta-data/ --log-level debug

Metrics

# HELP aws_instance_metadata_service_available Metadata service available
# TYPE aws_instance_metadata_service_available gauge
aws_instance_metadata_service_available{instance_id="i-0d2aab13057917887"} 1
# HELP aws_instance_metadata_service_events_available Metadata service events endpoint available
# TYPE aws_instance_metadata_service_events_available gauge
aws_instance_metadata_service_events_available{instance_id="i-0d2aab13057917887"} 1
# HELP aws_instance_rebalance_recommended Instance rebalance is recommended
# TYPE aws_instance_rebalance_recommended gauge
aws_instance_rebalance_recommended{instance_id="i-0d2aab13057917887",instance_type="c5.9xlarge"} 1
# HELP aws_instance_termination_imminent Instance is about to be terminated
# TYPE aws_instance_termination_imminent gauge
aws_instance_termination_imminent{instance_action="stop",instance_id="i-0d2aab13057917887",instance_type="c5.9xlarge"} 1
# HELP aws_instance_termination_in Instance will be terminated in
# TYPE aws_instance_termination_in gauge
aws_instance_termination_in{instance_id="i-0d2aab13057917887",instance_type="c5.9xlarge"} 119.714615