/health endpoint returns 404
Opened this issue · 3 comments
I tried updating to version v0.17.0 of the stackdriver-exporter container but the pod never achieves a running state, because it fails it's liveness probes. Queries to the /metrics endpoint works as expected, but /health returns 404 errors. I tried looking for an updated helm-chart for version v0.17.0, but it has not been released yet.
foo:~# curl 10.244.1.17:9255/metrics
# HELP go_gc_duration_seconds A summary of the wall-time pause (stop-the-world) duration in garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
<truncated>
foo:~# curl 10.244.1.17:9255/health
404 page not found
Containers:
prometheus-stackdriver-exporter:
Container ID: containerd://0e46d42f0432dfd3dcc37acd6f9b78edbcc6ae1c71c907b89d2ea67d55aa4269
Image: prometheuscommunity/stackdriver-exporter:v0.17.0
Image ID: docker.io/prometheuscommunity/stackdriver-exporter@sha256:ca514180d5f5e4997e78f94ad23a08d7ad81b932485bd2152c98504cb38c1fdb
Port: 9255/TCP
Host Port: 0/TCP
Command:
stackdriver_exporter
Args:
--google.project-id=<REMOVED>
--monitoring.metrics-interval=5m
--monitoring.metrics-offset=0s
--monitoring.metrics-type-prefixes=compute.googleapis.com/instance/cpu
--stackdriver.backoff-jitter=1s
--stackdriver.http-timeout=10s
--stackdriver.max-backoff=5s
--stackdriver.max-retries=0
--stackdriver.retry-statuses=503
--web.listen-address=:9255
--web.telemetry-path=/metrics
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 14 Nov 2024 10:55:31 -0700
Finished: Thu, 14 Nov 2024 10:56:21 -0700
Ready: False
Restart Count: 5
Liveness: http-get http://:http/health delay=30s timeout=10s period=10s #success=1 #failure=3
Readiness: http-get http://:http/health delay=10s timeout=10s period=10s #success=1 #failure=3
I upgraded to v0.17.0 this morning, same issue with health endpoint. Using https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-stackdriver-exporter (v4.6.2 chart) causes it to just crashloop.
You can modify the liveness / readiness checks in the chart to point to / instead of /health and the pod will come to a ready state, but the /health endpoint should be restored.
I'm also running into this issue