A comprehensive guide for collecting, and exporting telemetry data (metrics, logs, and traces) from Docker Swarm environment can be found at swarmlibs/dockerswarm-monitoring-guide.
A Docker Stack deployment for the monitoring suite for Docker Swarm includes (Grafana, Prometheus, cAdvisor, Node exporter and Blackbox prober exporter)
Important
This project is a work in progress and is not yet ready for production use. But feel free to test it and provide feedback.
Table of Contents:
- About
- Concepts
- Stacks
- Pre-requisites
- Getting Started
- Grafana
- Prometheus
- Services and Ports
- Troubleshooting
- License
This section covers some concepts that are important to understand for day to day Promstack usage and operation.
By design, the Prometheus server is configured to automatically discover and scrape the metrics from the Docker Swarm nodes, services and tasks. You can use Docker object labels in the deploy block to automagically register services as targets for Prometheus. It also configured with config provider and config reloader services.
Prometheus Kubernetes compatible labels
Here is a list of Docker Service/Task labels that are mapped to Kubernetes labels.
| Kubernetes | Docker | Scrape config |
|---|---|---|
namespace |
__meta_dockerswarm_service_label_com_docker_stack_namespace |
|
deployment |
__meta_dockerswarm_service_name |
|
pod |
dockerswarm_task_name |
dockerswarm/services |
service |
__meta_dockerswarm_service_name |
dockerswarm/services-endpoints |
- The dockerswarm_task_name is a combination of the service name, slot and task id.
- The task id is a unique identifier for the task. It depends on the mode of the deployement (replicated or global). If the service is replicated, the task id is the slot number. If the service is global, the task id is the node id.
The grafana and prometheus service requires extra services to operate, mainly for providing configuration files. There are two type of child services, a config provider and config reloader service.
Here an example visual representation of the services:
We leverage the below services:
- swarmlibs/prometheus-config-provider
- swarmlibs/grafana-provisioning-config-reloader
- prometheus-operator/prometheus-config-reloader
These are the services that are part of the stack:
- Blackbox exporter: https://github.com/prometheus/blackbox_exporter
- cAdvisor: https://github.com/google/cadvisor
- Grafana: https://github.com/grafana/grafana
- Node exporter: https://github.com/prometheus/node_exporter
- Prometheus: https://github.com/prometheus/prometheus
- Pushgateway: https://github.com/prometheus/pushgateway
- Docker running Swarm mode
- A Docker Swarm cluster with at least 3 nodes
- Configure Docker daemon to expose metrics for Prometheus
- The official swarmlibs stack, this provided necessary services for other stacks operate.
There are two ways to deploy the promstack stack:
- Unattented deployment
- Manually deploy
promstackstack
The unattented deployment is the recommended way to deploy the stack. It will automatically create the necessary networks and deploy the stack to the Docker Swarm cluster. The manual deployment is useful for debugging and troubleshooting the stack.
To deploy the stack, you can use the following command:
$ docker run -it --rm \
--name promstack \
-v /var/run/docker.sock:/var/run/docker.sock \
swarmlibs/promstack installFor more documentation, visit https://github.com/swarmlibs/docker-promstack.
To get started, clone this repository to your local machine:
git clone https://github.com/swarmlibs/promstack.git
# or
gh repo clone swarmlibs/promstackNavigate to the project directory:
cd promstackCreate user-defined networks:
make stack-networks
# or run the following command to create the networks manually
docker network create --scope=swarm --driver=overlay --attachable public
docker network create --scope=swarm --driver=overlay --attachable prometheus
docker network create --scope=swarm --driver=overlay --attachable prometheus_gwnetwork- This
publicnetwork is used by Ingress service and Blackbox exporter to perform network probes - The
prometheusnetwork is used to perform service discovery for Prometheus scrape configs. - The
prometheus_gwnetworknetwork is used for the internal communication between the Prometheus Server, exporters and other agents.
The grafana and prometheus service requires extra services to operate, mainly for providing configuration files. There are two type of child services, a config provider and config reloader service. In order to ensure placement of these services, you need to deploy the swarmlibs stack.
See https://github.com/swarmlibs/swarmlibs for more information.
This will deploy the stack to the Docker Swarm cluster. Please ensure you have the necessary permissions to deploy the stack and the swarmlibs stack is deployed. See Pre-requisites for more information.
Important
It is important to note that the promstack is the default stack namespace for this deployment.
It is NOT RECOMMENDED to change the stack namespace as it may cause issues with the deployment.
make deployWarning
This will remove the stack and all the services associated with it. Use with caution.
make removeTo verify the deployment, you can use the following commands:
docker service ls --filter label=com.docker.stack.namespace=promstack
# ID NAME MODE REPLICAS IMAGE
# ** promstack_blackbox-exporter replicated 1/1 (max 1 per node) prom/blackbox-exporter:v0.25.0
# ** promstack_cadvisor global 1/1 gcr.io/cadvisor/cadvisor:v0.47.0
# ** promstack_grafana replicated 1/1 (max 1 per node) swarmlibs/grafana:main
# ** promstack_grafana-dashboard-provider replicated 1/1 (max 1 per node) swarmlibs/prometheus-config-provider:0.1.0-rc.1
# ** promstack_grafana-provisioning-config-reloader replicated 1/1 (max 1 per node) swarmlibs/grafana-provisioning-config-reloader:0.1.0-rc.1
# ** promstack_grafana-provisioning-dashboard-provider replicated 1/1 (max 1 per node) swarmlibs/prometheus-config-provider:0.1.0-rc.1
# ** promstack_grafana-provisioning-datasource-provider replicated 1/1 (max 1 per node) swarmlibs/prometheus-config-provider:0.1.0-rc.1
# ** promstack_node-exporter global 1/1 prom/node-exporter:v1.8.1
# ** promstack_prometheus global 1/1 swarmlibs/genconfig:0.1.0-rc.1
# ** promstack_prometheus-config-provider global 1/1 swarmlibs/prometheus-config-provider:0.1.0-rc.1
# ** promstack_prometheus-config-reloader global 1/1 quay.io/prometheus-operator/prometheus-config-reloader:v0.74.0
# ** promstack_prometheus-server replicated 1/1 (max 1 per node) prom/prometheus:v2.45.6
# ** promstack_pushgateway replicated 1/1 (max 1 per node) prom/pushgateway:v1.9.0You can continously monitor the deployment by running the following command:
# The `watch` command will continously monitor the services in the stack and update the output every 2 seconds.
watch docker service ls --filter label=com.docker.stack.namespace=promstackThe Grafana service is configured with config provider and config reload services. The config provider service is responsible for providing the configuration files for the Grafana service. The config reloader service is responsible for reloading the Grafana service configuration when the config provider service updates the configuration files.
The following configuration are supported:
- Grafana Dashboards
- Provisioning (Datasources, Dashboards)
To inject a Grafana Provisioning configurations, you need to specify config object in your docker-compose.yml or docker-stack.yml file as shown below. The label io.grafana.dashboard=true is used by the config provider service to inject the dashboards into Grafana.
# See grafana/docker-stack.yml
configs:
# Grafana & Prometheus dashboards
gf-dashboard-grafana-metrics:
name: gf-dashboard-grafana-metrics-v1
file: ./dashboards/grafana-metrics.json
labels:
io.grafana.dashboard: "true"To inject a Grafana Provisioning configurations, you need to specify config object in your docker-compose.yml or docker-stack.yml file as shown below.
There are two types of provisioning configurations:
- Dashboards: Use
io.grafana.provisioning.dashboard=truelabel to inject the provisioning configuration for dashboards. - Datasources: Use
io.grafana.provisioning.datasource=truelabel to inject the provisioning configuration for data sources.
# See grafana/docker-stack.yml
configs:
# Grafana dashboards provisioning config
gf-provisioning-dashboards:
name: gf-provisioning-dashboards-v1
file: ./provisioning/dashboards/grafana-dashboards.yml
labels:
io.grafana.provisioning.dashboard: "true"
# Grafana datasources provisioning config
gf-provisioning-datasource-prometheus:
name: gf-provisioning-datasource-prometheus-v1
file: ./provisioning/datasources/prometheus.yaml
labels:
io.grafana.provisioning.datasource: "true"By design, the Prometheus server is configured to automatically discover and scrape the metrics from the Docker Swarm nodes, services and tasks. The default data retention is 182 days or ~6 months.
You can use Docker object labels in the deploy block to automagically register services as targets for Prometheus. It also configured with config provider and config reloader services.
io.prometheus.enabled: Enable the Prometheus scraping for the service.io.prometheus.job_name: The Prometheus job name. Default is<docker_stack_namespace>/<service_name|job_name>.io.prometheus.scrape_scheme: The scheme to scrape the metrics. Default ishttp.io.prometheus.scrape_port: The port to scrape the metrics. Default is80.io.prometheus.metrics_path: The path to scrape the metrics. Default is/metrics.io.prometheus.param_<name>: The Prometheus scrape parameters.
Example:
# Annotations:
services:
my-app:
# ...
networks:
prometheus:
deploy:
# ...
labels:
io.prometheus.enabled: "true"
io.prometheus.job_name: "my-app"
io.prometheus.scrape_port: "8080"
# As limitations of the Docker Swarm, you need to attach the service to the prometheus network.
# This is required to allow the Prometheus server to scrape the metrics.
networks:
prometheus:
name: prometheus
external: trueTo register a custom scrape config, you need to specify config object in your docker-compose.yml or docker-stack.yml file as shown below. The label io.prometheus.scrape_config=true is used by the Prometheus config provider service to inject the scrape config into Prometheus.
# See cadvisor/docker-stack.yml
configs:
prometheus-cadvisor:
name: prometheus-cadvisor-v1
file: ./prometheus/cadvisor.yml
labels:
io.prometheus.scrape_config: "true"You can apply custom configurations to Prometheus via Environment variables by running docker service update command on promstack_prometheus service:
# Register the Alertmanager service address
docker service update --env-add PROMETHEUS_SCRAPE_INTERVAL=15s promstack_prometheus
# Remove the Alertmanager service address
docker service update --env-rm PROMETHEUS_SCRAPE_INTERVAL promstack_prometheusPROMETHEUS_SCRAPE_INTERVAL: The scrape interval for Prometheus, default is10sPROMETHEUS_SCRAPE_TIMEOUT: The scrape timeout for Prometheus, default is5PROMETHEUS_EVALUATION_INTERVAL: The evaluation interval for Prometheus, default is1mPROMETHEUS_CLUSTER_NAME: The cluster name for Prometheus, default ispromstackPROMETHEUS_CLUSTER_REPLICA: The cluster replica for Prometheus, default is1PROMETHEUS_ALERTMANAGER_ADDR: The Alertmanager service addressPROMETHEUS_ALERTMANAGER_PORT: The Alertmanager service port, default is9093
The following services and ports are exposed by the stack:
| Service | Port | Mode | Cluster DNS |
|---|---|---|---|
| Grafana | 3000 |
grafana.svc.cluster.local |
|
| Prometheus | 9090 |
prometheus.svc.cluster.local |
|
| Pushgateway | pushgateway.svc.cluster.local |
||
| Blackbox exporter | blackbox-exporter.svc.cluster.local |
||
| cAdvisor | 18080 |
host |
|
| Node exporter | 19100 |
host |
If the Grafana dashboards are not present, please restart grafana service to reload the dashboards.
# By force updating the service, it will restart the service and reload the dashboards.
docker service update --force promstack_grafanaPlease ensure the services are attached to the prometheus network. This is required to allow the Prometheus server to scrape the metrics.
# Annotations:
services:
my-app:
# ...
networks:
prometheus:
deploy:
# ...
labels:
io.prometheus.enabled: "true"
io.prometheus.job_name: "my-app"
io.prometheus.scrape_port: "8080"
# As limitations of the Docker Swarm, you need to attach the service to the prometheus network.
# This is required to allow the Prometheus server to scrape the metrics.
networks:
prometheus:
name: prometheus
external: trueLicensed under the MIT License. See LICENSE for more information.