prometheus-community/stackdriver_exporter

Log based distribution metric counts each event 10 times

Opened this issue · 0 comments

In our project we noticed a peculiar discrepancy between the distribution metric values as reported by Google Cloud itself, and as exported into Prometheus by stackdriver_exporter. Upon further investigation, we could narrow down the misbehavior to what I stated in the title: each event is seemingly counted exactly 10 times, blowing up the metric values to 10x the correct value

We triggered three events after another. In Google Cloud, the _count metric correctly jumps to 1 for each event as they come in:

image

In Prometheus, using stackdriver_exporter, the _count metric incorrectly jumps to 10 for each event as they come in:

image

Later, we also tried triggering the same request twice and indeed, the stackdriver_exporter metric value increased first to 10, then to 20:

image


We set up stackdriver_exporter via Helm, using prometheus-stackdriver-exporter 4.6.2, corresponding to stackdriver_exporter 0.16.0

Out stackdriver_exporter (Helm) configuration:

prometheus-stackdriver-exporter:
  nameOverride: 'gcloud-metrics-exporter'
  fullnameOverride: 'gcloud-metrics-exporter'
  stackdriver:
    projectId: [redacted]
    serviceAccountKey: [redacted]
    metrics:
      typePrefixes: [redacted]
      # Workaround needed for accurate histogram metrics
      # https://github.com/prometheus-community/stackdriver_exporter?tab=readme-ov-file#what-to-know-about-aggregating-delta-metrics
      aggregateDeltas: true
      aggregateDeltasTTL: '30m' # default value from https://github.com/prometheus-community/helm-charts/blob/f2aeaf773cd22ae2bffb7ec846b06eadf4169387/charts/prometheus-stackdriver-exporter/values.yaml#L88 for lack of better judgment
  serviceMonitor:
    enabled: true
    namespace: monitoring