metrics_analyzer
is a command line tools which helps checking metrics between dashboards of grafana, prometheus rules and prometheus metrics and outputs whitelisted metrics or metrics to drop.
- Connect via API to Grafana and get all metrics which are use in any dashboards
- Connect via API to Prometheus and get all metrics which are use in any rules
- Connect via API to Prometheus and check exist metrics
- Generates regex output to add to service monitor
at least Python 3.10
To install metrics_analyzer, run this command directly from repository:
$ pip install git+https://github.com/oleksii-boiko-ua/metrics_analyzer.git
To use this tool locally, you will need to create port-forwards for your grafana and prometheus services.
$ kubectl port-forward svc/grafana 3000:80
$ kubectl port-forward svc/prometheus 9090:9090
Also, if you use an API key for Grafana you should set it in you environment.
$ export GRAFANA_KEY=...
Now you should be able to run the following script:
$ metrics_analyzer metrics-to-whitelist
Output example:
We can whitelist 10 metrics from the job argocd-metrics
process_cpu_seconds_total|ALERTS|go_goroutines|workqueue_depth|process_start_time_seconds|process_resident_memory_bytes|workqueue_adds_total|up|argocd_app_info|workqueue_queue_duration_seconds_bucket
We can whitelist 55 metrics from the job kube-state-metrics
kube_job_status_start_time|kube_deployment_status_replicas|kube_statefulset_status_replicas_ready|kube_pod_container_resource_requests|kube_node_spec_taint|kube_namespace_created|kube_node_status_allocatable|kube_pod_container_resource_limits|kube_deployment_labels|kube_node_info
We can whitelist 10 metrics from the job prometheus-grafana
process_cpu_seconds_total|grafana_http_request_duration_seconds_bucket|go_goroutines|process_start_time_seconds|grafana_build_info|process_resident_memory_bytes|grafana_http_request_duration_seconds_sum|grafana_stat_totals_dashboard|up|grafana_http_request_duration_seconds_count
example of metric relabelings for service monitor to whitelist recomended metrics
metricRelabelings:
- action: keep
regex: 'process_cpu_seconds_total|go_goroutines|prometheus_operator_watch_operations_total'
sourceLabels: [__name__]