Docker engine events exporter expose docker API events (oom, start, …) to prometheus metrics.
This is a fork of sbadia/docker-events-exporter with a focus on usage in a Docker/Docker Swarm environment without Kubernetes.
Proudly made by NeuroForge in Bayreuth, Germany.
Deploy:
version: "3.8"
services:
docker-engine-events-exporter:
image: ghcr.io/neuroforgede/docker-engine-events-exporter:latest
networks:
- net
environment:
- DOCKER_HOSTNAME={{.Node.Hostname}}
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
deploy:
mode: global
resources:
limits:
memory: 256M
reservations:
memory: 128M
prometheus.yml
# ...
scrape_configs:
- job_name: 'docker-engine-events-exporter'
dns_sd_configs:
- names:
- 'tasks.docker-engine-events-exporter'
type: 'A'
port: 9000
A monitoring solution based on the original swarmprom that includes this can be found at our Swarmsible repo
Then you can imagine to configure prometheus alerts based on thoses metrics, for example about containers with bad exit codes:
- alert: Container (Swarm) died/is dying with exit code other than 0
expr: count by (docker_hostname, container_attributes_com_docker_swarm_service_name, container_attributes_exitcode, status) (
(
docker_events_container_total{status=~"die|.*oom.*|.*kill.*", container_attributes_exitcode != "0", container_attributes_exitcode != "" }
unless
docker_events_container_total{status=~"die|.*oom.*|.*kill.*", container_attributes_exitcode != "0", container_attributes_exitcode != "" }
offset 10m
) OR (
increase(docker_events_container_total{status=~"die|.*oom.*|.*kill.*", container_attributes_exitcode != "0", container_attributes_exitcode != "" }[10m]) > 0
)
)
annotations:
summary: "Bad Exit code \"{{ $labels.container_attributes_exitcode }}\" for status \"{{ $labels.status }}\" for service \"{{ $labels.container_attributes_com_docker_swarm_service_name }}\""