/beaker-sre

Beaker dashboard monitoring machine reliability and availability

GNU General Public License v3.0GPL-3.0

Beaker SRE

Documentation and tools for deploying a monitoring dashbaord for Beaker.

Architecture

  1. Beaker stores all relevant information in its MariaDB database
  2. Prometheus exporter scripts compute metrics (SLIs) from the data and present them via included web server
  3. Prometheus regularly scrapes the exporters endpoints and stores metrics in its database
  4. Grafana uses the prometheus as its datasource and visualize the metrics on a Dashboard

Deployment

All pieces are available as container images in public registries so using docker/podman would be the preferable method of deplyment. Described below is the manual deployment directly to host system I used on an older system without container support.

Prometheus

  1. create user prometheus
    1. useradd prometheus
  2. as prometheus user in its homedir:
    1. download latest release from https://prometheus.io/download/
    2. untar and symlink to prometheus directory
    3. deploy config file to the prometheus directory
  3. as root
    1. deploy service file to systemd
      1. cp prometheus.service /etc/systemd/system/prometheus.service
      2. systemctl enable prometheus.service
    2. start prometheus
      1. systemctl start prometheus.service
    3. allow access from outside (optional):
      1. firewall-cmd --add-port=9090/tcp --permanent
      2. firewall-cmd --reload
  4. Verify Prometheus functionality
    1. go to http://ip.of.prometheus.server:9090

Blackbox exporter

  1. create user beaker-sre
    1. useradd beaker-sre
  2. as beaker-sre user in its homedir:
    1. download latest binary release from https://github.com/prometheus/blackbox_exporter/releases
    2. untar and symlink to blackbox_exporter directory
  3. as root
    1. deploy service file to systemd
      1. cp blackbox_exporter.service /etc/systemd/system/blackbox_exporter.service
      2. systemctl enable blackbox_exporter.service
    2. start blackbox_exporter
      1. systemctl start blackbox_exporter.service
  4. Verify blackbox_exporter functionality
    1. systemctl status blackbox_exporter
    2. go to http://ip.of.prometheus.server:3000, Status -> Targets
      1. blackbox should be listed in blue color and as "(X/X up)"
      2. on the graph page, try to execute query using the blackbox_exporter's data, e.g.probe_http_duration_seconds.

Grafana

  1. create user grafana
    1. useradd grafana
  2. as grafana user in its homedir:
    1. download latest standalone binary release from https://grafana.com/grafana/download?plcmt=top-nav&cta=downloads
    2. untar and symlink to grafana directory
    3. install aditional plugins
      1. ./grafana-cli --pluginsDir /home/grafana/grafana/data/plugins/ plugins install fzakaria-simple-annotations-datasource
      2. ./grafana-cli --pluginsDir /home/grafana/grafana/data/plugins/ plugins install simpod-json-datasource
  3. as root
    1. deploy service file to systemd
      1. cp grafana.service /etc/systemd/system/grafana.service
      2. systemctl enable grafana.service
    2. start grafana
      1. systemctl start grafana.service
    3. allow access from outside
      1. firewall-cmd --add-port=3000/tcp --permanent
      2. firewall-cmd --reload
  4. Verify Grafana functionality
    1. systemctl status grafana
    2. go to http://ip.of.prometheus.server:3000
  5. Log in as admin/admin, change password to something secure
  6. Connect Grafana to Prometheus
    1. in WebUI navigate to Configuration -> Data sources -> Add data source -> Prometheus -> Select
    2. Use http://localhost:9090 as the suggested URL, keep the rest on default values. Confirm.
  7. Setup Alerting as you see fit