ricoberger/script_exporter

[misleading title] script_exporter version 2.15 and later give significantly higher CPU usage compared to earlier versions

eliasrudberg opened this issue · 7 comments

Hello, I have a problem with significantly increased CPU usage when upgrading from script_exporter 2.14 to 2.15 and later versions. I suppose this must be due to one of the changes made from 2.14 to 2.15 but looking at those changes I don't see why they would give worse performance. What could it be? Could the "Update Dependencies" change #95 have such an effect?

As far as I can tell script_exporter still does its job, it just consumes more CPU resources, for some reason.

Hi @eliasrudberg, do you run the script_exporter via Docker?

Hi @ricoberger and thanks for answering! No, not using Docker.

Thanks for clarification, unfortunately I'm not able to reproduce the issue, for me the CPU and Memory usage between v2.14.0 and v2.15.0 are nearly the same:

cpu-usage memory-usage

Can you maybe check the script_duration_seconds metric for each of your scripts, to see if the higher CPU usage can come from a longer execution time of one of the scripts?

@ricoberger thanks, now I checked the script_duration_seconds and indeed I can see an increase there each time I tried version 2.15 and later, and each time I switched back to version 2.14 the script_duration_seconds went back to normal. So it looks like the execution time of the scripts increases, however that is not necessarily the cause of the problem, it could be a side effect from that the server hit 100% CPU usage. Is there some change between the versions that could affect the the execution time of the scripts?

Thanks for looking into this. There are not that much changes between the two versions. Did you also tried version 2.14.1?

I'm currently a bit lost, what could cause the higher CPU usage you see.

@ricoberger after investigating my problem more I hacve come to the conclusion that it had nothing to do with the script_exporter version after all. I had tested several times, but not enough, I was jumping to that conclusion too quick.

I still don't understand the problem I had completely, but I think it is approcimately the following: there are many prometheus targets for which script_exporter is invoked, and sometimes the server running script_exporter gets overwhelmed in a way that makes the CPU usage hit the roof. That seems to happen partly because of many script_exporter processes running in parallel, somehow they are slowing each other down. Somthing like that. The scenario that the server gets overwhelmed can sometimes be triggered by a script_exporter restart, and such a restart happens when a new script_exporter version has been installed. I now think it was just a coincidence that the problem seemed to appear for a certain script_exporter version.

I was able to avoid the problem by moving part of the workload to another server, and have then tested both script_exporter version 2.15 and the latest 2.17 without seeing problems. If I increase the workload again then the problem comes back, unrelated to the script_exporter version.

It's still a mystery to me why a fairly small increase in the workload would cause such a large increase in CPU usage, but that is not necessarily a script_exporter issue but could be something weird with how my server handles a situation with many parallel processes. I don't know.

So I will close this issue, and I apologize for causing confusion. Thanks a lot for your replies, the hint about looking at script_duration_seconds was very helpful and will most likely be useful also in other troubleshooting.

Thanks for the detailed feedback @eliasrudberg