node-directory-size-metrics eats up burst balance on aws
morkeleb opened this issue · 2 comments
We ran the all manifests setup on our kubernetes cluster, where we also have a jenkins instance.
Once jenkins runs a build, the node-directory-size-metrics pods starts reading from the persistent volume we have for the home directory.
This in turn causes a lot of read operations consuming all our burst credits. Effectively stalling our disk operations on our jenkins pod roughly 6 hours later.
removing node-directory-size-metrics from the deployment effectively stopped the excessive read operations. To compare running a build on our machine is 16k read operations every 5 minutes.
With node-directory-size-metrics we're seeing over 80k read operations every 5 minutes when the jenkinsprocess is idle.
I'm experiencing the same, here's what cat /proc/1/io
gives me
/ # cat /proc/1/io
rchar: 701229807
wchar: 55178558
syscr: 15223413
syscw: 659897
read_bytes: 647638831104
write_bytes: 4096
cancelled_write_bytes: 0
We are not actively maintaining the manifest yaml anymore, but we are glad to receive contributions. We will encourage you to run Prometheus through the Prometheus Operator. This implementation uses node exporter to get ndoe metrics and you can create alerts or dashboards based on that.