returns 500 status code with message metric was collected before with the same name and label values

Question

returns 500 status code with message metric was collected before with the same name and label values

ivanitskiy opened this issue 3 years ago · 3 comments

Describe the bug
nginx-prometheus-exporter returns 500 status code with message metric was collected before with the same name and label values when Prometheus scrapes metrics. this leads that Prometheus reports service/endpoint as down.

This is an intermittent issue for us. We have 1k+ pods with nginx-prometheus-exporter in 13 k8s clusters and occasionally see this issue.

To reproduce
Steps to reproduce the behavior:

Deploy using 'side car' patter next to nginx plus in a pod, here is pod spec part

    - name: nginx-metrics-exporter
      image: 'nginx/nginx-prometheus-exporter:0.8.0'
      ports:
        - name: metrics-http
          containerPort: 9113
          protocol: TCP
      env:
        - name: NGINX_PLUS
          value: 'true'
        - name: NGINX_RETRIES
          value: '10'
        - name: NGINX_RETRY_INTERVAL
          value: 30s
        - name: SCRAPE_URI
          value: 'http://localhost:52443/api'
        - name: POD_ID
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.uid
        - name: CONST_LABELS
          value: 'pod_id=$(POD_ID)'
      resources:
        limits:
          cpu: 20m
          memory: 100Mi
        requests:
          cpu: 5m
          memory: 20Mi
      volumeMounts:
        - name: default-token-tngb5
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      readinessProbe:
        httpGet:
          path: /
          port: 9113
          scheme: HTTP
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent

nginx-metric exporter logs:

2021/12/28 19:41:12 Starting NGINX Prometheus Exporter Version= GitCommit=
2021/12/28 19:41:12 Listening on :9113
2021/12/28 19:41:12 NGINX Prometheus Exporter has successfully started

Prometheus reports service down as couldn't fetch metrics:

I can validate that via curl:

curl -v http://localhost:9113/metrics
* About to connect() to localhost port 9113 (#0)
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9113 (#0)
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9113
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Tue, 28 Dec 2021 19:52:39 GMT
< Transfer-Encoding: chunked
<
An error has occurred while serving metrics:



28 error(s) occurred:
* collected metric "nginxplus_upstream_server_state" { label:<name:"pod_id" value:"82773ae7-03c5-456f-831d-b5f5a765e7a7" > label:<name:"server" value:"10.10.0.1:80" > label:<name:"upstream" value:"backendhttp" > gauge:<value:1 > } was collected before with the same name and label values

<ommited similar message >

Expected behavior
200 OK status code when Prometheus scrapes the metrics

Your environment

Version of the Prometheus exporter - release version or a specific commit
0.8.0
Version of Docker/Kubernetes

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}

Using NGINX Plus

Additional context
Add any other context about the problem here. Any log files you want to share.

prometheus scrapee config:

- job_name: monitoring/nginx-metrics-exporter/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - ns

Answer 1 · 2021-12-29T21:12:40.000Z

Hi @ivanitskiy thanks for reporting this.

Would it be possible for you to test if you still have this issue with our latest release https://github.com/nginxinc/nginx-prometheus-exporter/releases/tag/v0.10.0?

Answer 2 · 2022-03-30T02:07:21.000Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Answer 3 · 2022-04-10T02:08:06.000Z

This issue was closed because it has been stalled for 10 days with no activity.