facebookincubator/oomd

I'm afraid that PSI measurements are totally off either due to a kernel bug or MuQSS.

hakavlad opened this issue · 1 comments

Hi! See hakavlad/nohang#25 (comment) and below:

A bug with the current PSI is that it solely relies on it to kill/terminate processes.
It killed bees (btrfs dedup) while the memory & swap was mostly free.
nohang should check if memory is increasing or if available memory is low.
Memory pressure probably happened because I was compiling a new kernel.
Also, I use muqss scheduler with high frequency timer (1000HZ).

due to muqss kernel I am not sure that psi of memory is measured correctly.
It can stay high for many minutes although processes are idle.

The solution that I propose is that when PSI monitoring is activated, please check also that there is a lack of memory before killing. If no lack of free swap/memory, ignore the high PSI usage.

I executed the program and it indeed caused high PSI. (nohang PSI is still disabled)
I killed it and I now monitor PSI. it is still very high after 10 minutes:
some avg10=99.00 avg60=99.00 avg300=90.78 total=725036638
So I'm afraid that PSI measurements are totally off either due to a kernel bug or MuQSS.

Could you comment this, please?

I left a comment. Unfortunately this is not really related to oomd so I'm going to close this out.