returns 500 status code with message metric was collected before with the same name and label values
ivanitskiy opened this issue · 3 comments
Describe the bug
nginx-prometheus-exporter returns 500 status code with message metric was collected before with the same name and label values when Prometheus scrapes metrics. this leads that Prometheus reports service/endpoint as down.
This is an intermittent issue for us. We have 1k+ pods with nginx-prometheus-exporter in 13 k8s clusters and occasionally see this issue.
To reproduce
Steps to reproduce the behavior:
- Deploy using 'side car' patter next to nginx plus in a pod, here is pod spec part
- name: nginx-metrics-exporter
image: 'nginx/nginx-prometheus-exporter:0.8.0'
ports:
- name: metrics-http
containerPort: 9113
protocol: TCP
env:
- name: NGINX_PLUS
value: 'true'
- name: NGINX_RETRIES
value: '10'
- name: NGINX_RETRY_INTERVAL
value: 30s
- name: SCRAPE_URI
value: 'http://localhost:52443/api'
- name: POD_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: CONST_LABELS
value: 'pod_id=$(POD_ID)'
resources:
limits:
cpu: 20m
memory: 100Mi
requests:
cpu: 5m
memory: 20Mi
volumeMounts:
- name: default-token-tngb5
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
readinessProbe:
httpGet:
path: /
port: 9113
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
- nginx-metric exporter logs:
2021/12/28 19:41:12 Starting NGINX Prometheus Exporter Version= GitCommit=
2021/12/28 19:41:12 Listening on :9113
2021/12/28 19:41:12 NGINX Prometheus Exporter has successfully started
- Prometheus reports service down as couldn't fetch metrics:
I can validate that via curl:
curl -v http://localhost:9113/metrics
* About to connect() to localhost port 9113 (#0)
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9113 (#0)
> GET /metrics HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:9113
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Tue, 28 Dec 2021 19:52:39 GMT
< Transfer-Encoding: chunked
<
An error has occurred while serving metrics:
28 error(s) occurred:
* collected metric "nginxplus_upstream_server_state" { label:<name:"pod_id" value:"82773ae7-03c5-456f-831d-b5f5a765e7a7" > label:<name:"server" value:"10.10.0.1:80" > label:<name:"upstream" value:"backendhttp" > gauge:<value:1 > } was collected before with the same name and label values
<ommited similar message >
Expected behavior
200 OK status code when Prometheus scrapes the metrics
Your environment
-
Version of the Prometheus exporter - release version or a specific commit
0.8.0 -
Version of Docker/Kubernetes
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
- Using NGINX Plus
Additional context
Add any other context about the problem here. Any log files you want to share.
prometheus scrapee config:
- job_name: monitoring/nginx-metrics-exporter/0
honor_timestamps: true
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- ns
Hi @ivanitskiy thanks for reporting this.
Would it be possible for you to test if you still have this issue with our latest release https://github.com/nginxinc/nginx-prometheus-exporter/releases/tag/v0.10.0?
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 10 days.
This issue was closed because it has been stalled for 10 days with no activity.