/docker-engine-events-exporter

Expose docker API events to prometheus

Primary LanguagePythonApache License 2.0Apache-2.0

Docker engine events exporter (Docker Engine/Swarm)

Docker engine events exporter expose docker API events (oom, start, …) to prometheus metrics.

This is a fork of sbadia/docker-events-exporter with a focus on usage in a Docker/Docker Swarm environment without Kubernetes.

Proudly made by NeuroForge in Bayreuth, Germany.

Use in a Docker Swarm deployment

Deploy:

version: "3.8"

services:
  docker-engine-events-exporter:
    image: ghcr.io/neuroforgede/docker-engine-events-exporter:latest
    networks:
      - net
    environment:
      - DOCKER_HOSTNAME={{.Node.Hostname}}
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    deploy:
      mode: global
      resources:
        limits:
          memory: 256M
        reservations:
          memory: 128M

prometheus.yml

# ...
scrape_configs:
  - job_name: 'docker-engine-events-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.docker-engine-events-exporter'
      type: 'A'
      port: 9000

A monitoring solution based on the original swarmprom that includes this can be found at our Swarmsible repo

Prometheus alerts ?

Then you can imagine to configure prometheus alerts based on thoses metrics, for example about containers with bad exit codes:

  - alert: Container (Swarm) died/is dying with exit code other than 0
    expr: count by (docker_hostname, container_attributes_com_docker_swarm_service_name, container_attributes_exitcode, status) (
          (
              docker_events_container_total{status=~"die|.*oom.*|.*kill.*", container_attributes_exitcode != "0", container_attributes_exitcode != "" } 
              unless 
              docker_events_container_total{status=~"die|.*oom.*|.*kill.*", container_attributes_exitcode != "0", container_attributes_exitcode != "" }
              offset 10m
          ) OR (
              increase(docker_events_container_total{status=~"die|.*oom.*|.*kill.*", container_attributes_exitcode != "0", container_attributes_exitcode != "" }[10m]) > 0
          )
      )
    annotations:
      summary: "Bad Exit code \"{{ $labels.container_attributes_exitcode }}\" for status \"{{ $labels.status }}\" for service \"{{ $labels.container_attributes_com_docker_swarm_service_name }}\""