opstree/druid-exporter

Latest version takes forever to answer to "/metrics" endpoint

Closed this issue · 19 comments

Fresh machine, just installed:

  • 64 CPUs
  • 378 GB RAM
  • Debian 10
  • Golang 1.16.5

Cloned this repo (latest version, an hour ago), have the exporter running, then trying to scrape the metrics:

~$ time curl localhost:8080/metrics
real	0m25.907s
user	0m0.008s
sys	0m0.098s

CPU load is fine, ram is empty:

load average: 4.22, 2.14, 1.11

Same situation with the release v.0.10.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Oh no, this still happening. Wontfix, for real? XD

Hey apologies for this, actually I was busy with some other stuff. I will try to fix this over the weekend.

Thanks a lot! :D

Hey apologies for this, actually I was busy with some other stuff. I will try to fix this over the weekend.

Hi! Any news on this? :S

$ time curl druid-exporter.druid:8080/metrics
real	0m0.578s
user	0m0.013s
sys	0m0.033s

Seems like it is fixed now :D

@iamabhishek-dubey latest release is even worse.
When you request /metrics it takes forever, then starts and endless loop spitting metrics.

real 3m27.158s
user 0m0.124s
sys 0m2.159s

This was a curl I ended up killing with Ctrl+C

Quite strange, because I tested it on the druid cluster yesterday and it seems to work fine for us, can you please tell me the version of druid you are using?

0.20.1. Don't get me wrong. The scrapper gets the data, the problems is that druid-exporter takes some time to answer, then answers forever. I will try to record a gif now to show you.
Second try (also killed with ctrl+c):
real 5m0.205s
user 0m0.245s
sys 0m3.151s

No, I guess it will not be needed. Let me check this today

Thanks a lot man. And I am sorry to bring the bad news :(

Nope that's fine, this is the reason of making this project opensource

Hey @tanisdlj

Can you please test this image:- quay.io/opstree/druid-exporter:v0.11-pre

Have you tested this image?

Hey!
I tested this image in my production environment (more than 2k metrics) because I have the same issue with the version 0.11.
Metrics are correctly displayed with a response time between 25 and 45 seconds.
Btw it seems that there is a memory leak or something strange : the more you call the /metrics page, the longer it takes for the container to respond until it runs out of memory and is oomkilled by the system

Can you please try adjusting this flag value

metrics-cleanup-ttl

by default it's 5 minutes, try keeping it a minute

Hi @iamabhishek-dubey ! So sorry, I had a crazy week and didn't have a minute to spare. Will test that image asap, probably monday :/

Sure, Also try keeping TTL to 1 minute

Hey, I tested on my side, I have still the same problem with --metrics-cleanup-ttl=1