Log based distribution metric counts each event 10 times
Opened this issue · 0 comments
In our project we noticed a peculiar discrepancy between the distribution metric values as reported by Google Cloud itself, and as exported into Prometheus by stackdriver_exporter. Upon further investigation, we could narrow down the misbehavior to what I stated in the title: each event is seemingly counted exactly 10 times, blowing up the metric values to 10x the correct value
We triggered three events after another. In Google Cloud, the _count
metric correctly jumps to 1
for each event as they come in:
In Prometheus, using stackdriver_exporter, the _count
metric incorrectly jumps to 10
for each event as they come in:
Later, we also tried triggering the same request twice and indeed, the stackdriver_exporter metric value increased first to 10, then to 20:
We set up stackdriver_exporter via Helm, using prometheus-stackdriver-exporter 4.6.2, corresponding to stackdriver_exporter 0.16.0
Out stackdriver_exporter (Helm) configuration:
prometheus-stackdriver-exporter:
nameOverride: 'gcloud-metrics-exporter'
fullnameOverride: 'gcloud-metrics-exporter'
stackdriver:
projectId: [redacted]
serviceAccountKey: [redacted]
metrics:
typePrefixes: [redacted]
# Workaround needed for accurate histogram metrics
# https://github.com/prometheus-community/stackdriver_exporter?tab=readme-ov-file#what-to-know-about-aggregating-delta-metrics
aggregateDeltas: true
aggregateDeltasTTL: '30m' # default value from https://github.com/prometheus-community/helm-charts/blob/f2aeaf773cd22ae2bffb7ec846b06eadf4169387/charts/prometheus-stackdriver-exporter/values.yaml#L88 for lack of better judgment
serviceMonitor:
enabled: true
namespace: monitoring