Latest version takes forever to answer to "/metrics" endpoint
Closed this issue · 19 comments
Fresh machine, just installed:
- 64 CPUs
- 378 GB RAM
- Debian 10
- Golang 1.16.5
Cloned this repo (latest version, an hour ago), have the exporter running, then trying to scrape the metrics:
~$ time curl localhost:8080/metrics
real 0m25.907s
user 0m0.008s
sys 0m0.098s
CPU load is fine, ram is empty:
load average: 4.22, 2.14, 1.11
Same situation with the release v.0.10.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Oh no, this still happening. Wontfix, for real? XD
Hey apologies for this, actually I was busy with some other stuff. I will try to fix this over the weekend.
Thanks a lot! :D
Hey apologies for this, actually I was busy with some other stuff. I will try to fix this over the weekend.
Hi! Any news on this? :S
$ time curl druid-exporter.druid:8080/metrics
real 0m0.578s
user 0m0.013s
sys 0m0.033s
Seems like it is fixed now :D
@iamabhishek-dubey latest release is even worse.
When you request /metrics it takes forever, then starts and endless loop spitting metrics.
real 3m27.158s
user 0m0.124s
sys 0m2.159s
This was a curl I ended up killing with Ctrl+C
Quite strange, because I tested it on the druid cluster yesterday and it seems to work fine for us, can you please tell me the version of druid you are using?
0.20.1. Don't get me wrong. The scrapper gets the data, the problems is that druid-exporter takes some time to answer, then answers forever. I will try to record a gif now to show you.
Second try (also killed with ctrl+c):
real 5m0.205s
user 0m0.245s
sys 0m3.151s
No, I guess it will not be needed. Let me check this today
Thanks a lot man. And I am sorry to bring the bad news :(
Nope that's fine, this is the reason of making this project opensource
Hey @tanisdlj
Can you please test this image:- quay.io/opstree/druid-exporter:v0.11-pre
Have you tested this image?
Hey!
I tested this image in my production environment (more than 2k metrics) because I have the same issue with the version 0.11.
Metrics are correctly displayed with a response time between 25 and 45 seconds.
Btw it seems that there is a memory leak or something strange : the more you call the /metrics page, the longer it takes for the container to respond until it runs out of memory and is oomkilled by the system
Can you please try adjusting this flag value
metrics-cleanup-ttl
by default it's 5 minutes, try keeping it a minute
Hi @iamabhishek-dubey ! So sorry, I had a crazy week and didn't have a minute to spare. Will test that image asap, probably monday :/
Sure, Also try keeping TTL to 1 minute
Hey, I tested on my side, I have still the same problem with --metrics-cleanup-ttl=1