Latest version takes forever to answer to "/metrics" endpoint

Question

Latest version takes forever to answer to "/metrics" endpoint

Closed this issue 3 years ago · 19 comments

Fresh machine, just installed:

64 CPUs
378 GB RAM
Debian 10
Golang 1.16.5

Cloned this repo (latest version, an hour ago), have the exporter running, then trying to scrape the metrics:

~$ time curl localhost:8080/metrics
real	0m25.907s
user	0m0.008s
sys	0m0.098s

CPU load is fine, ram is empty:

load average: 4.22, 2.14, 1.11

Same situation with the release v.0.10.

Answer 1 · 2021-08-15T17:40:53.000Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Answer 2 · 2021-08-19T16:05:44.000Z

Oh no, this still happening. Wontfix, for real? XD

Answer 3 · 2021-08-19T16:08:30.000Z

Hey apologies for this, actually I was busy with some other stuff. I will try to fix this over the weekend.

Answer 4 · 2021-08-24T12:07:06.000Z

Thanks a lot! :D

Answer 5 · 2021-09-14T09:53:16.000Z

Hey apologies for this, actually I was busy with some other stuff. I will try to fix this over the weekend.

Hi! Any news on this? :S

Answer 6 · 2021-09-19T09:17:20.000Z

$ time curl druid-exporter.druid:8080/metrics
real	0m0.578s
user	0m0.013s
sys	0m0.033s

Seems like it is fixed now :D

Answer 7 · 2021-09-20T08:40:49.000Z

@iamabhishek-dubey latest release is even worse.
When you request /metrics it takes forever, then starts and endless loop spitting metrics.

real 3m27.158s
user 0m0.124s
sys 0m2.159s

This was a curl I ended up killing with Ctrl+C

Answer 8 · 2021-09-20T08:42:57.000Z

Quite strange, because I tested it on the druid cluster yesterday and it seems to work fine for us, can you please tell me the version of druid you are using?

Answer 9 · 2021-09-20T08:49:48.000Z

0.20.1. Don't get me wrong. The scrapper gets the data, the problems is that druid-exporter takes some time to answer, then answers forever. I will try to record a gif now to show you.
Second try (also killed with ctrl+c):
real 5m0.205s
user 0m0.245s
sys 0m3.151s

Answer 10 · 2021-09-20T08:52:30.000Z

No, I guess it will not be needed. Let me check this today

Answer 11 · 2021-09-20T09:10:05.000Z

Thanks a lot man. And I am sorry to bring the bad news :(

Answer 12 · 2021-09-20T09:54:55.000Z

Nope that's fine, this is the reason of making this project opensource

Answer 13 · 2021-09-20T13:56:20.000Z

Hey @tanisdlj

Can you please test this image:- quay.io/opstree/druid-exporter:v0.11-pre

Answer 14 · 2021-09-22T06:30:56.000Z

Have you tested this image?

Answer 15 · 2021-09-22T08:44:14.000Z

Hey!
I tested this image in my production environment (more than 2k metrics) because I have the same issue with the version 0.11.
Metrics are correctly displayed with a response time between 25 and 45 seconds.
Btw it seems that there is a memory leak or something strange : the more you call the /metrics page, the longer it takes for the container to respond until it runs out of memory and is oomkilled by the system

Answer 16 · 2021-09-23T10:54:49.000Z

Can you please try adjusting this flag value

metrics-cleanup-ttl

by default it's 5 minutes, try keeping it a minute

Answer 17 · 2021-09-24T07:26:05.000Z

Hi @iamabhishek-dubey ! So sorry, I had a crazy week and didn't have a minute to spare. Will test that image asap, probably monday :/

Answer 18 · 2021-09-24T07:29:46.000Z

Sure, Also try keeping TTL to 1 minute

Answer 19 · 2021-09-24T07:36:31.000Z

Hey, I tested on my side, I have still the same problem with --metrics-cleanup-ttl=1