Questions on `analyze.py`

Question

Questions on `analyze.py`

Opened this issue 5 years ago · 2 comments

Hello!
Thanks for your post under https://unix.stackexchange.com/questions/415814/memory-runs-full-over-time-high-buffer-cache-usage-low-available-memory/456688#456688?newreg=86be4a98bf95414cae594bd460a38068. It lead me to this repo of yours.

I have been suspicious about the same behaviour on my machine just as you have, and surprise, I also used xflux! So I have now uninstalled it.

However I am still witnessing slow and steady increase in SUnreclaim under /proc/meminfo so I decided to use your scripts to investigate further. I order to do so efficiently I've got a few questions about your code in analyze.py:

why are you limiting outputs to processes that have been running only between 20% and 80% of the time?
what is the rationale behind the coef < np.max(diff) / 3 threshold imposed?

Thanks in advance for your work, and hope to get answers to these questions.

Answer 1 · 2020-06-25T08:37:44.000Z

The idea is to start record.py while the process you suspect isn't running and let it record for a while. Then start the process and keep recording for a similar amount of time. analyze.py tries to correlate the growth of the slab with the process being active or not. Since it needs enough data where the process is active and where it's not, it sorts out those where this is not the case with the if not 0.2 < np.mean(running) < 0.8.

As for the coef < np.max(diff) / 3, I'm not 100% sure as it's been a while since I wrote this, but it looks to me like it's removing processes that seem to have a very small correlation to slab growth.

Hope that helps!

Answer 2 · 2020-07-02T21:56:43.000Z

Many thanks Max.
I only started your record.py long after my suspicious process had been running. I'll try next as you suggest.