improve Prometheus integration
thesuperzapper opened this issue · 0 comments
thesuperzapper commented
While we currently have limited support for Prometheus through our serviceMonitor.*
and prometheusRule.*
values, we can make the situation much better.
Tasks:
- Implement prometheus/statsd_exporter
- OPTION 1: as a sidecar container for airflow pods
- This option is recommended by the creators of
statsd_exporter
(need to check why, but its probably so that Prometheus associates the metrics to the actual Pod that is generating them) - I am not sure if all Pods will need the sidecar or just the scheduler
- We would then configure
AIRFLOW__METRICS__STATSD_HOST
to be localhost - We would annotate the Pods with the sidecar to have
prometheus.io/scrape: "true
" andprometheus.io/port: "xxxx"
- This option is recommended by the creators of
- OPTION 2: as a central deployment
- This would reduce the number of containers
- We would then configure
AIRFLOW__METRICS__STATSD_HOST
to be the service of this deployment (but this is possibly a security risk, as other pods could send bad data if no NetworkPolicy prevents invalid access) - We would annotate the Deployment Pods to have
prometheus.io/scrape: "true
" andprometheus.io/port: "xxxx"
- OPTION 1: as a sidecar container for airflow pods
- Implement prometheus-community/pgbouncer_exporter
- This is probably best implemented as a sidecar of our PgBouncer Deployment
- We would annotate the Deployment Pods to have
prometheus.io/scrape: "true
" andprometheus.io/port: "xxxx"
- Consider what to do with the Prometheus Operator resource values
- The existing
serviceMonitor.*
andprometheusRule.*
values could be automatically configured (but there is an argument that these are configs for the user's Prometheus, and should not be managed by the chart). - For some reason, these resources are currently stored under the
template/webserver/
folder (when they are not really specific to the webserver)
- The existing
- Update the docs about Prometheus
- Update the "How to integrate airflow with Prometheus?" page
- Add some example Grafana dashboards for airflow
Completing these tasks should replace the need for the following issues: