airflow-helm/charts

improve Prometheus integration

thesuperzapper opened this issue · 0 comments

While we currently have limited support for Prometheus through our serviceMonitor.* and prometheusRule.* values, we can make the situation much better.

Tasks:

  1. Implement prometheus/statsd_exporter
    1. OPTION 1: as a sidecar container for airflow pods
      • This option is recommended by the creators of statsd_exporter (need to check why, but its probably so that Prometheus associates the metrics to the actual Pod that is generating them)
      • I am not sure if all Pods will need the sidecar or just the scheduler
      • We would then configure AIRFLOW__METRICS__STATSD_HOST to be localhost
      • We would annotate the Pods with the sidecar to have prometheus.io/scrape: "true" and prometheus.io/port: "xxxx"
    2. OPTION 2: as a central deployment
      • This would reduce the number of containers
      • We would then configure AIRFLOW__METRICS__STATSD_HOST to be the service of this deployment (but this is possibly a security risk, as other pods could send bad data if no NetworkPolicy prevents invalid access)
      • We would annotate the Deployment Pods to have prometheus.io/scrape: "true" and prometheus.io/port: "xxxx"
  2. Implement prometheus-community/pgbouncer_exporter
    • This is probably best implemented as a sidecar of our PgBouncer Deployment
    • We would annotate the Deployment Pods to have prometheus.io/scrape: "true" and prometheus.io/port: "xxxx"
  3. Consider what to do with the Prometheus Operator resource values
    • The existing serviceMonitor.* and prometheusRule.* values could be automatically configured (but there is an argument that these are configs for the user's Prometheus, and should not be managed by the chart).
    • For some reason, these resources are currently stored under the template/webserver/ folder (when they are not really specific to the webserver)
  4. Update the docs about Prometheus

Completing these tasks should replace the need for the following issues: