/k8s-prometheus-adapter

An implementation of the custom.metrics.k8s.io API using Prometheus

Primary LanguageGoApache License 2.0Apache-2.0

Prometheus Adapter for Kubernetes Metrics APIs

Build Status

This repository contains an implementation of the Kubernetes resource metrics API and custom metrics API.

This adapter is therefore suitable for use with the autoscaling/v2 Horizontal Pod Autoscaler in Kubernetes 1.6+.
It can also replace the metrics server on clusters that already run Prometheus and collect the appropriate metrics.

Quick Links

Installation

If you're a helm user, a helm chart is listed on the Kubeapps Hub as stable/prometheus-adapter.

To install it with the release name my-release, run this Helm command:

$ helm install --name my-release stable/prometheus-adapter

Configuration

The adapter takes the standard Kubernetes generic API server arguments (including those for authentication and authorization). By default, it will attempt to using Kubernetes in-cluster config to connect to the cluster.

It takes the following addition arguments specific to configuring how the adapter talks to Prometheus and the main Kubernetes cluster:

  • --lister-kubeconfig=<path-to-kubeconfig>: This configures how the adapter talks to a Kubernetes API server in order to list objects when operating with label selectors. By default, it will use in-cluster config.

  • --metrics-relist-interval=<duration>: This is the interval at which to update the cache of available metrics from Prometheus. Since the adapter only lists metrics during discovery that exist between the current time and the last discovery query, your relist interval should be equal to or larger than your Prometheus scrape interval, otherwise your metrics will occaisonally disappear from the adapter.

  • --prometheus-url=<url>: This is the URL used to connect to Prometheus. It will eventually contain query parameters to configure the connection.

  • --config=<yaml-file> (-c): This configures how the adapter discovers available Prometheus metrics and the associated Kubernetes resources, and how it presents those metrics in the custom metrics API. More information about this file can be found in docs/config.md.

Presentation

The adapter gathers the names of available metrics from Prometheus at a regular interval (see Configuration above), and then only exposes metrics that follow specific forms.

The rules governing this discovery are specified in a configuration file. If you were relying on the implicit rules from the previous version of the adapter, you can use the included config-gen tool to generate a configuration that matches the old implicit ruleset:

$ go run cmd/config-gen/main.go [--rate-interval=<duration>] [--label-prefix=<prefix>]

Example

A brief walkthrough exists in docs/walkthrough.md.

Additionally, @luxas has an excellent example deployment of Prometheus, this adapter, and a demo pod which serves a metric http_requests_total, which becomes the custom metrics API metric pods/http_requests. It also autoscales on that metric using the autoscaling/v2beta1 HorizontalPodAutoscaler. Note that @luxas's tutorial uses a slightly older version of the adapter.

It can be found at https://github.com/luxas/kubeadm-workshop. Pay special attention to:

FAQs

Why do my metrics keep jumping between a normal value and a very large number?

You're probably switching between whole numbers (e.g. 10) and milli-quantities (e.g. 10500m). Worry not! This is just how Kubernetes represents fractional values. See the Quantity Values section of the walkthrough for a bit more information.

Why isn't my metric showing up?

First, check your configuration. Does it select your metric? You can find the default configuration in the deploy directory, and more information about configuring the adapter in the docs.

Next, check if the discovery information looks right. You should see the metrics showing up as associated with the resources you expect at /apis/custom.metrics.k8s.io/v1beta1/ (you can use kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 to check, and can pipe to jq to pretty-print the results, if you have it installed). If not, make sure your series are labeled correctly. Consumers of the custom metrics API (especially the HPA) don't do any special logic to associate a particular resource to a particular series, so you have to make sure that the adapter does it instead.

For example, if you want a series foo to be associated with deployment bar in namespace somens, make sure there's some label that represents deployment name, and that the adapter is configured to use it. With the default config, that means you'd need the query foo{namespace="somens",deployment="bar"} to return some results in Prometheus.

Next, try using the --v=6 flag on the adapter to see the exact queries being made by the adapter. Try url-decoding the query and pasting it into the Prometheus web console to see if the query looks wrong.

My query contains multiple metrics, how do I make that work?

It's actually fairly straightforward, if a bit non-obvious. Simply choose one metric to act as the "discovery" and "naming" metric, and use that to configure the "discovery" and "naming" parts of the configuration. Then, you can write whichever metrics you want in the metricsQuery! The series query can contain whichever metrics you want, as long as they have the right set of labels.

For example, suppose you have two metrics foo_total and foo_count, both with the label system_name, which represents the node resource. Then, you might write

rules:
- seriesQuery: 'foo_total'
  resources: {overrides: {system_name: {resource: "node"}}}
  name:
    matches: 'foo_total'
    as: 'foo'
  metricsQuery: 'sum(foo_total{<<.LabelMatchers>>}) by (<<.GroupBy>>) / sum(foo_count{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

I get errors about SubjectAccessReviews/system:anonymous/TLS/Certificates/RequestHeader!

It's important to understand the role of TLS in the Kubernetes cluster. There's a high-level overview here: https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md.

All of the above errors generally boil down to misconfigured certificates. Specifically, you'll need to make sure your cluster's aggregation layer is properly configured, with requestheader certificates set up properly.

Errors about SubjectAccessReviews failing for system:anonymous generally mean that your cluster's given requestheader CA doesn't trust the proxy certificates from the API server aggregator.

On the other hand, if you get an error from the aggregator about invalid certificates, it's probably because the CA specified in the caBundle field of your APIService object doesn't trust the serving certificates for the adapter.

If you're seeing SubjectAccessReviews failures for non-anonymous users, check your RBAC rules -- you probably haven't given users permission to operate on resources in the custom.metrics.k8s.io API group.

My metrics appear and disappear

You probably have a Prometheus collection interval or computation interval that's larger than your adapter's discovery interval. If the metrics appear in discovery but occaisionally return not-found, those intervals are probably larger than one of the rate windows used in one of your queries. The adapter only considers metrics with datapoints in the window [now-discoveryInterval, now] (in order to only capture metrics that are still present), so make sure that your discovery interval is at least as large as your collection interval.

I get errors when query namespace prefixed metrics?

I have namespace prefixed metrics like { "name": "namespaces/node_memory_PageTables_bytes", "singularName": "", "namespaced": false, "kind": "MetricValueList", "verbs": [ "get" ] }, but I get error Error from server (InternalError): Internal error occurred: unable to list matching resources when access with kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/*/node_memory_PageTables_bytes .

Actually namespace prefixed metrics are special, we should access them with kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/*/metrics/node_memory_PageTables_bytes.