facebookincubator/oomd

doesn't kill cgroup, unable to set xattr trusted.oomd_ooms=1

nartes opened this issue · 5 comments

Description: oomd has identified a process, but can't kill it.

Package: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=oomd

Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/util/Fs.cpp:576] Unable to set xattr trusted.oomd_ooms=1 on /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-launched-firefox-11870.scope. errno=30
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/plugins/BaseKillPlugin.cpp:96] Trying to kill /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-launched-firefox-11870.scope
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/plugins/KillMemoryGrowth-inl.h:168] Picked "user.slice/user-1000.slice/user@1000.service/gnome-launched-firefox-11870.scope" (2040MB) based on size > 10% of total 6989MB (size threshold overridden)
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/util/Fs.cpp:576] Unable to set xattr trusted.oomd_kill=0 on /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service. errno=30
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/plugins/BaseKillPlugin.cpp:141] Killed 0: 1377(ssh-agent)[E1] 1401(tmux: server)[E1] 1402(zsh)[E1] 1427(zsh)[E1] 1454(vim)[E1] 1455(zsh)[E1] 1485(htop)[E1] 1496(zsh)[E1] 1521(zsh)[E1] 46339(zsh)[E1] 4>
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/util/Fs.cpp:576] Unable to set xattr trusted.oomd_ooms=1 on /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service. errno=30
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/plugins/BaseKillPlugin.cpp:96] Trying to kill /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/plugins/KillMemoryGrowth-inl.h:168] Picked "user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service" (2370MB) based on size > 10% of total 6989MB (size threshold overridden)
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/OomdContext.cpp:163]   io_cost_cumulative=0 io_cost_rate=0
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/OomdContext.cpp:156]   mem=8MB mem_avg=7MB mem_low=0MB mem_min=0MB mem_prot=0MB anon=6MB swap_usage=0MB
Mar 06 17:53:21 MACHINE_NAME oomd[69346]: [../src/oomd/OomdContext.cpp:151]   pressure=0:0:0-0:0:0

In

1377(ssh-agent)[E1] 1401(tmux: server)[E1] 1402(zsh)[E1]

E1 means kill(, SIGKILL) failed with EPERM. Is oomd running with the right permissions?

In

Unable to set xattr trusted.oomd_kill=0 on /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service. errno=30

errno=30 means setxattr failed with EROFS (readonly FS).

Are you using a hybrid cgroup1 + cgroup2 setup? May be unrelated but would be good to know.

@danobi perhaps it is some issue with cgroups permissions setup.
Could you tell me some bash commands to debug a killing procedure?
I didn't read the source yet, but thought about just hacking it with system('kill -9 %d', process_pid) in Fs.cpp instead of cryptic trusted.oomd_kill = 0 attributes.
What is this attribute, is it related to a facebook contributed kernel module?
I didn't find any documents on a killing procedure used by oomd.
It is puzzling me at the moment.

P.S.

  1. cgroups configuring on archlinux https://aur.archlinux.org/cgit/aur.git/commit/?h=oomd&id=3a6dcdb577bfa3c874894889315f0c940174bf73
  2. Some kernel parameters in PKGBUILD https://aur.archlinux.org/cgit/aur.git/commit/?h=oomd&id=3a6dcdb577bfa3c874894889315f0c940174bf73

P.P.S.

systemctl status oomd
● oomd.service - userspace out-of-memory killer
     Loaded: loaded (/usr/lib/systemd/system/oomd.service; enabled; vendor preset: disabled)
     Active: active (running) since Sun 2020-03-08 22:35:50 +03; 24h ago
    Process: 584 ExecStartPre=/usr/bin/oomd --check-config ${OOMD_CONFIG} (code=exited, status=0/SUCCESS)
   Main PID: 594 (oomd)
      Tasks: 3 (limit: 9336)
     Memory: 2.6M (low: 64.0M)
        CPU: 9min 34.582s
     CGroup: /system.slice/oomd.service
             └─594 /usr/bin/oomd --config /etc/oomd.json --interval 5

Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/OomdContext.cpp:156]   mem=11MB mem_avg=11MB mem_low=0MB mem_min=0MB mem_prot=0MB anon=6MB swap_usage=0MB
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/OomdContext.cpp:163]   io_cost_cumulative=0 io_cost_rate=0
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/OomdContext.cpp:150] name=user.slice/user-1000.slice/user@1000.service/gsd-media-keys.service
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/OomdContext.cpp:151]   pressure=0:0:0-0:0:0
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/OomdContext.cpp:156]   mem=8MB mem_avg=8MB mem_low=0MB mem_min=0MB mem_prot=0MB anon=6MB swap_usage=0MB
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/OomdContext.cpp:163]   io_cost_cumulative=0 io_cost_rate=0
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/plugins/KillMemoryGrowth-inl.h:168] Picked "user.slice/user-1000.slice/user@1000.service/gnome-launched-firefox-29108.scope" (2519MB) based on size > 10% of tot>
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/plugins/BaseKillPlugin.cpp:92] OOMD: In dry-run mode; would have tried to kill /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/gnome-launched-firefo>
Mar 09 22:38:21 MACHINE_NAME oomd[594]: [../src/oomd/Log.cpp:114] 0.00 0.00 0.00 user.slice/user-1000.slice/user@1000.service/gnome-launched-firefox-29108.scope 2641997824 ruleset:[user session protection] detecto>
Mar 09 22:38:22 MACHINE_NAME oomd[594]: [../src/oomd/engine/Ruleset.cpp:134] Action=kill_by_memory_size_or_growth returned STOP. Terminating action chain.

P.P.P.S.
A process of /usr/bin/oomd is being executed under root user.

P.P.P.S.

yay -Qs cgroup
local/libcgroup 0.41-2
    Library that abstracts the control group file system in Linux

This is the kill code: https://github.com/facebookincubator/oomd/blob/master/src/oomd/plugins/BaseKillPlugin.cpp#L138

cryptic trusted.oomd_kill = 0 attributes. What is this attribute, is it related to a facebook contributed kernel module?

It's an extended attribute. See man 7 xattr for more details. It's so delegated cgroup subtrees can know when a kill was performed.

Your systemctl status oomd shows dry-run mode on. With dry run mode on for plugins the previous log messages cannot have been printed. Are you sure you're sending information about the same setup?

Can you also share the oomd config you're using?

It is in dry-run, but the above problem has been reported without it. I've used dry run mode to debug killing selector.

//
// Basic configuration for a desktop linux machine
//

{
    "rulesets": [
        {
            "name": "user session protection",
            "detectors": [
                [
                    "user pressure above 60 for 30s",
                    {
                        "name": "memory_above",
                        "args": {
                            "cgroup": "user.slice",
                            "threshold": "80%",
                            "duration": "1"
                        }
                    }
                ]
            ],
            "actions": [
                {
                    "name": "kill_by_memory_size_or_growth",
                    "args": {
                        "cgroup": "user.slice/user-*.slice/user@*.service/*",
			"size_threshold": 10,
			"post_action_delay": 1,
			"dry": true
                    }
                }
            ]
        }
    ]
}