
returns 500 status code with message metric was collected before with the same name and label values

Describe the bug
nginx-prometheus-exporter returns 500 status code with message metric was collected before with the same name and label values when Prometheus scrapes metrics. this leads that Prometheus reports service/endpoint as down.

This is an intermittent issue for us. We have 1k+ pods with nginx-prometheus-exporter in 13 k8s clusters and occasionally see this issue.

To reproduce
Steps to reproduce the behavior:

  1. Deploy using 'side car' patter next to nginx plus in a pod, here is pod spec part
    - name: nginx-metrics-exporter
      image: 'nginx/nginx-prometheus-exporter:0.8.0'
        - name: metrics-http
          containerPort: 9113
          protocol: TCP
        - name: NGINX_PLUS
          value: 'true'
        - name: NGINX_RETRIES
          value: '10'
        - name: NGINX_RETRY_INTERVAL
          value: 30s
        - name: SCRAPE_URI
          value: 'http://localhost:52443/api'
        - name: POD_ID
              apiVersion: v1
              fieldPath: metadata.uid
        - name: CONST_LABELS
          value: 'pod_id=$(POD_ID)'
          cpu: 20m
          memory: 100Mi
          cpu: 5m
          memory: 20Mi
        - name: default-token-tngb5
          readOnly: true
          mountPath: /var/run/secrets/
          path: /
          port: 9113
          scheme: HTTP
        timeoutSeconds: 1
        periodSeconds: 10
        successThreshold: 1
        failureThreshold: 3
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  1. nginx-metric exporter logs:
2021/12/28 19:41:12 Starting NGINX Prometheus Exporter Version= GitCommit=
2021/12/28 19:41:12 Listening on :9113
2021/12/28 19:41:12 NGINX Prometheus Exporter has successfully started
  1. Prometheus reports service down as couldn't fetch metrics:


I can validate that via curl:

curl -v http://localhost:9113/metrics
* About to connect() to localhost port 9113 (#0)
* Trying
* Connected to localhost ( port 9113 (#0)
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9113
> Accept: */*
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Tue, 28 Dec 2021 19:52:39 GMT
< Transfer-Encoding: chunked
An error has occurred while serving metrics:

28 error(s) occurred:
* collected metric "nginxplus_upstream_server_state" { label:<name:"pod_id" value:"82773ae7-03c5-456f-831d-b5f5a765e7a7" > label:<name:"server" value:"" > label:<name:"upstream" value:"backendhttp" > gauge:<value:1 > } was collected before with the same name and label values

<ommited similar message > 

Expected behavior
200 OK status code when Prometheus scrapes the metrics

Your environment

  • Version of the Prometheus exporter - release version or a specific commit

  • Version of Docker/Kubernetes

Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
  • Using NGINX Plus

Additional context
Add any other context about the problem here. Any log files you want to share.

prometheus scrapee config:

- job_name: monitoring/nginx-metrics-exporter/0
  honor_timestamps: true
  scrape_interval: 30s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  - role: endpoints
      - ns

Hi @ivanitskiy thanks for reporting this.

Would it be possible for you to test if you still have this issue with our latest release

