Prometheus Midware

Hierarchy and Architecture

modules and contents:

- midware_prometheus/
    - daemonize.py
    - prometheus_main.py
    - prometheus.py
    - config.json
  • daemonize.py: This module assists user to run prometheus middleware as a daemon.
  • prometheus_main.py: This module contains the user interface of prometheus middlware.
  • prometheus.py: This module contains the main control logic of the entire prometheus middleware, including grabbing data from the prometheus server, formatting scraped data, outputting the data into a csv file, etc.
  • config.json: This is the main configuration file of prometheus middleware.

Usage

$ python3 prometheus_main.py [daemon|restart|stop|start]

Prometheus midware can be run in both background and foreground.

Foreground

  • start: start argument runs the midware normally in the foreground.
$ python3 prometheus_main.py start
Starting...
Daemonize_off
...

Background

  • daemon: Daemon arguemnt input would daemonize the process using daemonize module implemented by 鎮寧. The midware will be detached from the current terminal, then run in the background.
  • stop: Stop argument kills the background prometheus middleware process.
$ python3 zabbix_main.py daemon
Starting...
$ python3 zabbix_main.py stop
Stopping...
Daemon_has_stoped

Configuration File

Every settings related to prometheus middleware can be configured in config.json.

Below is a sample config.json:

{
    "out_Dir": "test",
        "url": "http://dev.k8s:31390",
        "configs": [
        {
            "ip": "172.16.1.99",
            "exporter": "kubelet",
            "probe": "kubernetes_probe",
            "metrics": {
                "kubernetes_cpu_usage_sum": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",pod!=\"\"}[3m]))",
                "kubernetes_cpu_usage_request": "sum(kube_pod_container_resource_requests_cpu_cores)",
                "kubernetes_memory_usage_sum": "sum(rate(container_memory_usage_bytes{container!=\"POD\",pod!=\"\"}[3m]))",
                "kubernetes_memory_usage_request": "sum(kube_pod_container_resource_requests_memory_bytes)",
                "kubernetes_network_transmit_bytes_total": "sum(rate(container_network_transmit_bytes_total{container!=\"POD\"}[3m]))",
                "kubernetes_network_receive_bytes_total": "sum(rate(container_network_receive_bytes_total{container!=\"POD\"}[3m]))",
                "kubernetes_container_restart_total": "sum(kube_pod_container_status_restarts_total)"
            },
            "write_metrics": [
            ]
        },
        {
            "ip": "172.16.1.99",
            "exporter": "kubelet",
            "probe": "kubernetes_container_probe",
            "metrics": {
                "kubernetes_container_cpu_usage": "rate(container_cpu_usage_seconds_total{container!=\"POD\",pod!=\"\"}[3m])",
                "kubernetes_container_memory_usage": "rate(container_memory_usage_bytes{container!=\"POD\",pod!=\"\"}[3m])"
            },
            "write_metrics": [
                "namespace",
            "pod",
            "container"
            ]
        },
        {
            "ip": "172.16.1.99",
            "exporter": "kubelet",
            "probe": "kubernetes_pod_probe",
            "metrics": {
                "kubernetes_pod_cpu_usage": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",pod!=\"\"}[3m])) by (pod)",
                "kubernetes_pod_memory_usage": "sum(rate(container_memory_usage_bytes{container!=\"POD\",pod!=\"\"}[3m])) by (pod)"
            },
            "write_metrics": [
                "pod"
            ]

        },
        {
            "ip": "172.16.1.99",
            "exporter": "kubelet",
            "probe": "kubernetes_node_probe",
            "metrics": {
                "kubernetes_node_allocable_pods": "kube_node_status_allocatable_pods",
                "kubernetes_node_allocable_cpu_core": "kube_node_status_allocatable_cpu_cores",
                "kubernetes_node_allocable_memory": "kube_node_status_allocatable_memory_bytes"
            },
            "write_metrics": [
                "node"
            ]

        },
        {
            "ip": "172.16.1.99",
            "exporter": "kubelet",
            "probe": "kubernetes_namespace_probe",
            "metrics": {
                "kubernetes_namespace_cpu_usage": "sum(rate(container_cpu_usage_seconds_total{container!=\"POD\",namespace!=\"\"}[3m])) by (namespace)",
                "kubernetes_namespace_memory_usage": "sum(rate(container_memory_usage_bytes{container!=\"POD\",namespace!=\"\"}[3m])) by (namespace)"
            },
            "write_metrics": [
                "namespace"
            ]

        }
    ]
}
  • out_Dir: This field specifies the output directory of the csv files.
  • url: Specifies the url of your prometheus server.
  • configs: This field is an array that contains the setting of different probes. Each element in this array is an object that contains the following fields:
    • ip: The IP address of your target.
    • exporter: The exporter that the probe uses.
    • probe: The name of the probe.
    • metrics: An array of metrics that you want to use in the probe. The metrics must be a PromQL expression. (More information for PromQL)
    • write_metrics: An array of metric labels that you want write to your csv file.

Output CSV

The name format of the csv file is shown below:

Format probe name @ date . csv
Example 1 kubernetes_probe @ 20201204_16_08 . csv
Example 2 kubernetes_node_probe @ 20201204_16_10 . csv
  • 📝 Date format is "%Y%m%d_%H_%M"

The format of the content is shown below:

Format value subprobe name subfields subfields ... target IP date
Example 1 0.010182706464034987 kubernetes_container_cpu_usage kube-system (namespace) calico-node (container) calico-node-2pnlr (pod) 172.16.1.99 20201204_16:08:17
Example 2 110 kubernetes_node_allocable_pods 1.dev.k8s (node) 172.16.1.99 20201204_16:10:13
  • 📝 Date format is "%Y%m%d_%H:%M:%S"

Sample

Below is a sample output csv file kubernetes_probe@20201204_16_08.csv:

23.3665936816224,kubernetes_cpu_usage_sum,172.16.1.99,20201204_16:08:17
1.65,kubernetes_cpu_usage_request,172.16.1.99,20201204_16:08:17
16913905.219381742,kubernetes_memory_usage_sum,172.16.1.99,20201204_16:08:17
817889280,kubernetes_memory_usage_request,172.16.1.99,20201204_16:08:17
54819.22700615068,kubernetes_network_transmit_bytes_total,172.16.1.99,20201204_16:08:17
49244.03774023048,kubernetes_network_receive_bytes_total,172.16.1.99,20201204_16:08:17
11660,kubernetes_container_restart_total,172.16.1.99,20201204_16:08:17

Kubernetes Exporter

kubelet metrics

  • Provides metrics via cAdvisor.
  • Provides container-level metrics such as resource usage from running containers.

kube-state-metrics

  • A simple service that listens to the Kubernetes API server and generates metrics about the state of Kubernetes.
  • It focuses on the state of the various objects inside Kubernetes, such as metrics based on pod, deployments, replica sets, etc.

apiserver metrics

  • Provides metrics via kube-apiserver
  • Provides cluster level metrics that monitors noncontainerized workloads, such as load-balanced cluster services, client certificates, and so on.

Kuberentes Metrics

The following is the metric I used in this project:

  • container_cpu_usage_seconds_total
    • Exporter: kubelet
    • Description: The current cumulative CPU usage time of the container
  • container_memory_usage_bytes
    • Exporter: kubelet
    • Description: The current cumulative memory usage (in bytes)
  • container_network_transmit_bytes_total
    • Exporter: kubelet
    • Description: The cumulative amount of data transmitted in the container network
  • container_network_receive_bytes_total
    • Exporter: kubelet
    • Description: The cumulative amount of data received in the container network
  • kube_pod_container_resource_requests_cpu_cores
    • Exporter: kube-state-metrics
    • Description: The number of CPU cores currently required by the Pod
  • kube_pod_container_status_restarts_total
    • Exporter: kube-state-metrics
    • Description: Cumulative number of Pods that restarts
  • kube_pod_container_resource_requests_memory_bytes
    • Exporter: kube-state-metrics
    • Description: The number of memory (in bytes) currently required by the Pod
  • kube_node_status_allocatable_cpu_cores
    • Exporter: kube-state-metrics
    • Description: CPU resources currently provided by Node
  • kube_node_status_allocatable_memory_bytes
    • Exporter: kube-state-metrics
    • Description: Memory resources currently provided by Node
  • kube_node_status_allocatable_pods
    • Exporter: kube-state-metrics
    • Description: Number of pods currently provided by Node
  • apiserver_request_total
    • Exporter: kube-apiserver
    • Description: Monitor the source requests, destination request, and whether the request were successful.

Kubernetes Probe

  • kubernetes_cpu_usage_sum
    • Metric: sum(rate(container_cpu_usage_seconds_total{container!="POD",pod!=""}[3m]))
    • Description: Collect the cumulative CPU usage time of the entire Kubernetes in the past 3 minutes.
  • kubernetes_memory_usage_sum
    • Metric: sum(rate(container_memory_usage_bytes{container!="POD",pod!=""}[3m]))
    • Description: Collect the cumulative memory usage of the entire Kubernetes in the past 3 minutes.
  • kubernetes_cpu_usage_request
    • Metric: sum(kube_pod_container_resource_requests_cpu_cores)
    • Description: Collect the memory usage required and used by Pods in the entire Kubernetes.
  • kubernetes_memory_usage_request
    • Metric: sum(kube_pod_container_resource_requests_memory_bytes)
    • Description: Collect the cumulative data transmission volume of the entire Kubernetes in the past 3 minutes.
  • kubernetes_network_receive_bytes_total
    • Metric: sum(rate(container_network_receive_bytes_total{container!="POD"}[3m]))
    • Description: Collect the cumulative received data volume of the entire Kubernetes in the past 3 minutes.
  • kubernetes_network_transmit_bytes_total
    • Metric: sum(rate(container_network_transmit_bytes_total{container!="POD"}[3m]))
    • Description: Collect the cumulative received data volume of the entire Kubernetes in the past 3 minutes.
  • kubernetes_container_restart_total
    • Metric: sum(kube_pod_container_status_restarts_total)
    • Description: Collect the cumulative number of Pod restarts in the entire Kubernetes.

Kubernetes Container Probe

  • kubernetes_container_cpu_usage
    • Metric: rate(container_cpu_usage_seconds_total{container!="POD",pod!=""}[3m])
    • Description: Collect the cumulative CPU usage time of each container in the past 3 minutes.
  • kubernetes_container_memory_usage
    • Metric: rate(container_memory_usage_bytes{container!="POD",pod!=""}[3m])
    • Description: Collect the cumulative memory usage of each container in the past 3 minutes.

Kubernetes Pod Probe

  • kubernetes_pod_cpu_usage
    • Metric: sum(rate(container_cpu_usage_seconds_total{container!="POD",pod!=""}[3m])) by (pod)
    • Description: Collect the cumulative CPU usage time of different Pods in the past 3 minutes.
  • kubernetes_pod_memory_usage
    • Metric: sum(rate(container_memory_usage_bytes{container!="POD",pod!=""}[3m])) by (pod)
    • Description: Collect the cumulative memory usage of different Pods in the past 3 minutes.

Kubernetes Node Probe

  • kubernetes_node_allocable_pods
    • Metric: kube_node_status_allocatable_pods
    • Description: Collect the pod resources currently provided by each Node.
  • kubernetes_node_allocable_cpu_core
    • Metric: kube_node_status_allocatable_cpu_cores
    • Description: Collect the CPU resource usage currently provided by each Node.
  • kubernetes_node_allocable_memory
    • Metric: kube_node_status_allocatable_memory_bytes
    • Description: Collect the number of memory bytes currently provided by each Node.

Kubernetes Apiserver Probe

  • kubernetes_apiserver_success_requests

    • Metric: sum(rate(apiserver_request_total{code=~"2.."}[3m]))
    • Description: Collect all the successful requests from kube-apiserver.
  • kubernetes_apiserver_failed_requests

    • Metric: sum(rate(apiserver_request_total{code=~"[45].."}[3m]))
    • Description: Collect all the failed requests from kube-apiserver.
  • 📝 Note that some of the above metrics calculate the average value under 3 minutes. The interval can be set to some other suitable number.

Reference

  1. Prometheus documentation
  2. Kubernetes in Production: The Ultimate Guide to Monitoring Resource Metrics with Prometheus
  3. Metrics used in Alamada