lvmteam/lvm2

Mark lvm2 as ineligible for the OOM killer

Closed this issue · 7 comments

lvm2 should be marked as ineligible for the OOM killer whenever it being killed would leave the system in an inconsistent state.

We are already marked as privileged process with increased priority - so pretty much last on any killable queue (since first all the non-root userspace standard prio task would need to be gone)

It's worth to note - that lvm2 does depend on functioning systemd & systemd-udevd - thus when OOM decides to kill those - we might get into blocked scenario...

It's worth to note - that lvm2 does depend on functioning systemd & systemd-udevd - thus when OOM decides to kill those - we might get into blocked scenario...

PID 1 is ineligible in any case (if it exits the kernel panics). systemd-udevd already has a very low OOM score (-900).

We are already marked as privileged process with increased priority - so pretty much last on any killable queue (since first all the non-root userspace standard prio task would need to be gone)

This seems to be fragile and depend on specific kernel heuristics. To me, setting the score to -1000 (disabling the OOM killer outright) seems safer, but I could be wrong. Perhaps this could be a config option?

If there would be potentially a bug in memory consumption by lvm2 - we still could be probably fine if we are being killed - it might make system more operational then if we are keeping with memory eating.
So if the system is so short of memory - lvm2 should not be an obstacle.
We are not 'kernel' task - and there are only very limited usually 'short' critical sections - and since we make sure we are memory 'locked' before and we have highest priority - there is very low chance we would ever get killed inside critical section - if we would - user has way bigger problems...
Since our way of 'memory locking' into RAM is quite tricky - we prefer to keep doors 'open' and let the kernel kill us if we take 'too much'....

So we do not consider this to be an issue...

BTW - OOM is now in hands of /usr/lib/systemd/systemd-oomd

Also I assume that systemd and systemd-udevd do not mlockall(), so I would expect that blocking on them while there are suspended devices is unsafe.

Yes there are some 'problematic' dependencies - however so far we are not getting any reports related - so it's more or less an academic discussion about absolute correctness..

Yes there are some 'problematic' dependencies - however so far we are not getting any reports related - so it's more or less an academic discussion about absolute correctness..

This could well be because the failure mode (system freeze) is practically impossible to track down. I have had system freezes in the past that might have been due to lvm2, but I have no way of knowing if they are. Servers likely do not run lvm2 as often and so are less likely to encounter such issues. Therefore, I still think this should be fixed, such that blocking happens after the critical section.

We are more focused on some 'real' bugs - these academic ones are just standing in deep queue... ;)
So if there is a real reproducer for this issue, we can then consider trying to fix it.
Without a reproducer it's very hard to actually even prove something was fixed...