/docker_monitoring_logging_alerting

Docker host and container monitoring, logging and alerting out of the box using cAdvisor, Prometheus, Grafana for monitoring, Elasticsearch, Kibana and Logstash for logging and elastalert and Alertmanager for alerting.

Primary LanguageShell

If you have any feedback regarding this monitoring/logging/alerting suite, any ideas for improvement, fixes, questions or comments, please feel free to contact me or do a PR!

What is this?

Blog post on Medium with some more elaboration.

This is an out of the box monitoring, logging and alerting suite for Docker-hosts and their containers, complete with dashboards to monitor and explore your host and container logs and metrics.

Monitoring: cAdvisor and node_exporter for collection, Prometheus for storage, Grafana for visualisation.

Logging: Filebeat for collection and log-collection and forwarding, Logstash for aggregation and processing, Elasticsearch as datastore/backend and Kibana as the frontend.

Alerting: elastalert as a drop-in for Elastic.io's Watcher for alerts triggered by certain container or host log events and Prometheus' Alertmanager for alerts regarding metrics.

grafana_screenshot

kibana_screenshot

alerts_screenshot

WARNING: This configuration is for testing purposes only. As it is simply forwarding ports at the moment, if your box is accessible publicly, all your logs and metrics will be out in the open. Switch off port forwarding before using this in an "online" environment.

The Grafana dashboard (a bit slimmed down) can also be found on grafana.net: https://grafana.net/dashboards/395.

How to set it up?

This suite comes with storage directories for Kibana and Grafana that contain the configuration for the data sources and dashboards. these directories will be mounted into the two containers as volumes. This is for your convenience and eliminates some manual setup steps.

  1. git clone this repository: git clone https://github.com/uschtwill/docker_monitoring_logging_alerting.git
  2. cd into the folder: cd docker_monitoring_logging_alerting
  3. Run the setup script: sh setup.sh
  4. Enjoy and explore your logs and metrics:
  • To explore any metric that's collected without having to build queries: localhost:3000/dashboard/db/data-exploration.
  • To see all monitoring alerts and their status in prometheus: localhost:9090/alerts.
  • To manage your monitoring alerts (e.g. silence them) in Alertmanager: localhost:9093/#/alerts. Elastalert (logging alerts) unfortunately does not have a frontend.
  • Just to see what the cAdvisor frontend looks like (you'll use Grafana for looking at monitoring metrics anyways): localhost:8080/containers/
  • To say hello to your Elasticsearch instance: curl localhost:9200.
  1. Run any containers with the same logging options as defined in this suite's docker-compose.ymland add a container_group label to enable monitoring, logging and alerting for them.
  2. AFTER you're done testing this suite and you want to revert to the state before setting up, run the cleanup script to clean up after yourself: sh cleanup.sh

Alerting and Annotations in Grafana

This suite uses elastalert and Alertmanager for alerting. Rules for logging alerts (elastalert) go into ./elastalert/rules/ and rules for monitoring alerts (Alertmanager) go into ./prometheus/rules/. Alertmanager only takes care of the communications part the monitoring alerts, the rules themselves are defined "in" Prometheus.

Both Alertmanager and elastalert can be configured to send their alerts to various outputs. In this suite, Logstash and Slack are set up. The integration with Logstash works out of the box, for adding Slack you will need to insert your webhook url.

The alerts that are sent to Logstash can be checked by looking at the 'logstash-alerts' index in Kibana. Apart from functioning as a first output, sending and storing the alerts to Elasticsearch via Logstash is also neat because it allows us to query them from Grafana and have them imported to its Dashboards as annotations.

annotations_screenshot

The monitoring alerting rules, which are stored in the Prometheus directory, contain a fake alert that should be firing from the beginning and demonstrates the concept. Find it and comment it out to have some peace. Also, there should be logging alerts coming in soon as well, this suite by itself already consists of 10 containers, and something is always complaining. Of course you can also force things by breaking stuff yourself - the blanket_log-level_catch.yaml rule that's already set up should catch it.

If you're annoyed by non-events repeatedly triggering alerts, throw them in ./logstash/config/31-non-events.conf in order for logstash to silence them by overwritting their log_level upon import.

Grafana/Prometheus Query Building

Unfortunately Grafana doesn't appear to have a fancy query builder for Prometheus as it has for Graphite or InfluxDB, instead one has to plainly type out one's queries.

Alas, when building Grafana graphs/dashboards with Prometheus as a data storage, knowing it's query dsl and metric types is important. This also means, that documentation about using Grafana with an InfluxDB won't help you much, further narrowing down the number of available resources. This is kind of unfortunate.

Here you can find the official documentation for Prometheus on both the query dsl and the metric types:

Information on Prometheus Querying

Information on Prometheus Metric Types

Furthermore, since I couldn't find proper documentation on the metrics cAdvisor and Prometheus/Node-Exporter expose, I decided to just take the info from the /metrics entpoints and bring it into a human-readable format.

Check them here. Combining the information on the exposed metrics themselves with that on Prometheus' query dsl and metric types, you should be good to go to build some beautiful dashboards yourself.

Known Issues

Bad umask: If your umask is bad, and not for example 0022, it could create files/folder with low permissions. Some containers do not start up when that is the case, e.g. Kibana can't read the configd. Setting this umask before downloading the git repo fixes this issue. (pointed out by @riemers)