False positive for containerd
Closed this issue · 22 comments
IMPORTANT: it's a report for needrestart v3.6 since it's the latest version available on ubuntu's. If it was fixed in one of the later releases - my apologies, I don't see anything relevant in the changelog though.
Affected software:
ubuntu 24.04
needrestart: 3.6 (3.6-7ubuntu4.3)
containerd: 1.7.12-0ubuntu4.1
# needrestart -rl -p -v
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v3.6
[main] running in root mode
[main] systemd detected
[main] vm detected
[Core] #813 is a NeedRestart::Interp::Python
[Python] #813: source=/usr/bin/networkd-dispatcher
[Core] #860 is a NeedRestart::Interp::Python
[Python] #860: source=/usr/share/unattended-upgrades/unattended-upgrade-shutdown
[main] #1789 uses obsolete binary /pause
[main] #1789 is a child of #1571
[main] #1800 uses obsolete binary /pause
[main] #1800 is a child of #1572
[main] #1801 uses obsolete binary /pause
[main] #1801 is a child of #1567
[main] #1815 uses obsolete binary /pause
[main] #1815 is a child of #1559
[main] #1816 uses obsolete binary /pause
[main] #1816 is a child of #1564
[main] #1819 uses obsolete binary /pause
[main] #1819 is a child of #1643
[main] #1825 uses obsolete binary /pause
[main] #1825 is a child of #1732
[main] #1842 uses obsolete binary /pause
[main] #1842 is a child of #1731
[main] #1997 uses obsolete binary /usr/bin/cadvisor
[main] #1997 is a child of #1567
[main] #2011 uses obsolete binary /csi-node-driver-registrar
[main] #2011 is a child of #1643
[main] #2077 uses obsolete binary /bin/node_exporter
[main] #2077 is a child of #1572
[main] #2078 uses obsolete binary /node-problem-detector
[main] #2078 is a child of #1732
[main] #2391 uses obsolete binary /usr/local/bin/kube-router
[main] #2391 is a child of #1559
[main] #2779 uses obsolete binary /usr/local/bin/crowdsec
[main] #2779 is a child of #1731
[main] #3905 uses obsolete binary /pause
[main] #3905 is a child of #3872
[main] #3970 uses obsolete binary /pause
[main] #3970 is a child of #3940
[main] #4194 uses obsolete binary /speaker
[main] #4194 is a child of #1564
[main] #4371 uses obsolete binary /usr/bin/ceph-exporter
[main] #4371 is a child of #3940
[main] #4611 uses obsolete binary /pause
[main] #4611 is a child of #4592
[main] #5337 uses obsolete binary /usr/local/bin/cephcsi
[main] #5337 is a child of #1643
[main] #5377 uses obsolete binary /usr/bin/python3.9
[main] #5377 is a child of #4592
[main] #5738 uses obsolete binary /usr/bin/ceph-osd
[main] #5738 is a child of #3872
[main] #6427 uses obsolete binary /fluent-bit/bin/fluent-bit
[main] #6427 is a child of #1571
[main] #1559 exe => /usr/bin/containerd-shim-runc-v2
[main] #1559 is containerd.service
[main] #1564 exe => /usr/bin/containerd-shim-runc-v2
[main] #1564 is containerd.service
[main] #1567 exe => /usr/bin/containerd-shim-runc-v2
[main] #1567 is containerd.service
[main] #1571 exe => /usr/bin/containerd-shim-runc-v2
[main] #1571 is containerd.service
[main] #1572 exe => /usr/bin/containerd-shim-runc-v2
[main] #1572 is containerd.service
[main] #1643 exe => /usr/bin/containerd-shim-runc-v2
[main] #1643 is containerd.service
[main] #1731 exe => /usr/bin/containerd-shim-runc-v2
[main] #1731 is containerd.service
[main] #1732 exe => /usr/bin/containerd-shim-runc-v2
[main] #1732 is containerd.service
[main] #3872 exe => /usr/bin/containerd-shim-runc-v2
[main] #3872 is containerd.service
[main] #3940 exe => /usr/bin/containerd-shim-runc-v2
[main] #3940 is containerd.service
[main] #4592 exe => /usr/bin/containerd-shim-runc-v2
[main] #4592 is containerd.service
[main] inside container or vm, skipping microcode checks
[Kernel] Linux: kernel release 6.8.0-49-generic, kernel version #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 4 02:06:24 UTC 2024
[Kernel/Linux] /boot/vmlinuz.old => 6.8.0-48-generic (buildd@lcy02-amd64-010) #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 [6.8.0-48-generic]
[Kernel/Linux] /boot/vmlinuz-6.8.0-49-generic => 6.8.0-49-generic (buildd@lcy02-amd64-028) #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 4 02:06:24 UTC 2024 [6.8.0-49-generic]*
[Kernel/Linux] /boot/vmlinuz-6.8.0-48-generic => 6.8.0-48-generic (buildd@lcy02-amd64-010) #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 [6.8.0-48-generic]
[Kernel/Linux] /boot/vmlinuz => 6.8.0-49-generic (buildd@lcy02-amd64-028) #49-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 4 02:06:24 UTC 2024 [6.8.0-49-generic]*
[Kernel/Linux] Expected linux version: 6.8.0-49-generic
WARN - Kernel: 6.8.0-49-generic, Services: 1 (!), Containers: none, Sessions: none|Kernel=0;0;;0;2 Services=1;;0;0 Containers=0;;0;0 Sessions=0;0;;0
Services:
- containerd.service
It started happening just this morning, so I guess it's canonical's needrestart packaging of needrestart that is to blame?
Start-Date: 2024-11-20 18:16:04
Commandline: /usr/bin/unattended-upgrade
Upgrade: needrestart:amd64 (3.6-7ubuntu4.1, 3.6-7ubuntu4.3)
End-Date: 2024-11-20 18:16:05
UPDATE: confirmed, after downgrading to 3.6-7ubuntu4.1
it works as expected. So it's 3.6-7ubuntu4.3
release that is at fault. Not sure where to report it though.
UPDATE 2: reported it at https://bugs.launchpad.net/ubuntu/+source/needrestart/+bug/2089193 So I guess it should be closed here?
Will need to validate - A patch was recently merged from upstream for https://ubuntu.com/security/notices/USN-7117-1
Can you also run this upstream in the machine and see if it behaves the same?
Likewise for Debian 12 (Bookworm) with needrestart 3.6-4+deb12u2
I cloned needrestart locally and with bisect found that the problem was introduced in the following commit: 6ce6136
root@hostname:/tmp/needrestart# git checkout 2da2ae
Previous HEAD position was 6ce6136 core: prevent race condition on /proc/$PID/exec evaluation
HEAD is now at 2da2ae2 [Core] refactor device number comparison to be independent of leading zeros (closes #286)
root@hostname:/tmp/needrestart# ./needrestart -rl -p
OK - Kernel: 5.15.0-122-generic, Services: none, Containers: none, Sessions: none|Kernel=0;0;;0;2 Services=0;;0;0 Containers=0;;0;0 Sessions=0;0;;0
root@hostname:/tmp/needrestart# git checkout 6ce613
Previous HEAD position was 2da2ae2 [Core] refactor device number comparison to be independent of leading zeros (closes #286)
HEAD is now at 6ce6136 core: prevent race condition on /proc/$PID/exec evaluation
root@hostname:/tmp/needrestart# ./needrestart -rl -p
WARN - Kernel: 5.15.0-122-generic, Services: 1 (!), Containers: none, Sessions: none|Kernel=0;0;;0;2 Services=1;;0;0 Containers=0;;0;0 Sessions=0;0;;0
Services:
- containerd.service
We have seen downstream reports in Debian which might be related:
https://bugs.debian.org/1087957
https://bugs.debian.org/1088012
But they are not yet further analyzed.
I cloned needrestart locally and with bisect found that the problem was introduced in the following commit: 6ce6136
Thanks! Could you please provide the output of needrestart -v
and a ls -lha /proc/$PID/exe
?
# ./needrestart -v
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v3.7
[main] running in root mode
[Core] Using UI 'NeedRestart::UI::stdio'...
[main] systemd detected
[main] vm detected
[main] #1737 uses obsolete binary /pause
[LXC] LXD installed via snap
[main] #1737 is a child of #1595
[main] #1906 uses obsolete binary /pause
[main] #1906 is a child of #1717
[main] #2177 uses obsolete binary /pause
[main] #2177 is a child of #1967
[main] #2450 uses obsolete binary /pause
[main] #2450 is a child of #2287
[main] #2572 uses obsolete binary /usr/bin/cadvisor
[main] #2572 is a child of #1717
[main] #2612 uses obsolete binary /bin/node_exporter
[main] #2612 is a child of #1595
[main] #2854 uses obsolete binary /app/containerd-exporter
[main] #2854 is a child of #2287
[main] #3051 uses obsolete binary /usr/local/bin/kube-router
[main] #3051 is a child of #1967
[main] #493899 uses obsolete binary /pause
[main] #493899 is a child of #493878
[main] #495304 uses obsolete binary /usr/local/bin/crowdsec
[main] #495304 is a child of #493878
[main] #1075942 uses obsolete binary /pause
[main] #1075942 is a child of #1075912
[main] #1075984 uses obsolete binary /usr/local/bin/rook
[main] #1075984 is a child of #1075912
[main] #1076867 uses obsolete binary /pause
[main] #1076867 is a child of #1076829
[main] #1077361 uses obsolete binary /usr/bin/ceph-exporter
[main] #1077361 is a child of #1076829
[main] #1138821 uses obsolete binary /pause
[main] #1138821 is a child of #1138779
[main] #1143518 uses obsolete binary /usr/bin/ceph-osd
[main] #1143518 is a child of #1138779
[main] #1146068 uses obsolete binary /pause
[main] #1146068 is a child of #1145965
[main] #1146333 uses obsolete binary /pause
[main] #1146333 is a child of #1146226
[main] #1149093 uses obsolete binary /usr/bin/ceph-osd
[main] #1149093 is a child of #1146226
[main] #1149857 uses obsolete binary /usr/bin/ceph-osd
[main] #1149857 is a child of #1145965
[main] #1210451 uses obsolete binary /pause
[main] #1210451 is a child of #1210406
[main] #1212777 uses obsolete binary /usr/local/bin/cephcsi
[main] #1212777 is a child of #1210406
[main] #1212830 uses obsolete binary /csi-node-driver-registrar
[main] #1212830 is a child of #1210406
[main] #1950048 uses obsolete binary /pause
[main] #1950048 is a child of #1950027
[main] #1950449 uses obsolete binary /node-problem-detector
[main] #1950449 is a child of #1950027
[main] #1951392 uses obsolete binary /pause
[main] #1951392 is a child of #1951314
[main] #1951430 uses obsolete binary /pause
[main] #1951430 is a child of #1951378
[main] #1952251 uses obsolete binary /usr/bin/radosgw
[main] #1952251 is a child of #1951378
[main] #1952751 uses obsolete binary /usr/bin/python3.9
[main] #1952751 is a child of #1951314
[main] #1962281 uses obsolete binary /pause
[main] #1962281 is a child of #1962241
[main] #1962615 uses obsolete binary /usr/bin/radosgw
[main] #1962615 is a child of #1962241
[main] #1970255 uses obsolete binary /pause
[main] #1970255 is a child of #1970221
[main] #1970730 uses obsolete binary /usr/bin/radosgw
[main] #1970730 is a child of #1970221
[main] #2461058 uses obsolete binary /pause
[main] #2461058 is a child of #2461036
[main] #2461345 uses obsolete binary /app/oom-exporter
[main] #2461345 is a child of #2461036
[main] #2844503 uses obsolete binary /pause
[main] #2844503 is a child of #2844466
[main] #2844641 uses obsolete binary /fluent-bit/bin/fluent-bit
[main] #2844641 is a child of #2844466
[main] #3618231 uses obsolete binary /pause
[main] #3618231 is a child of #3618210
[main] #3618517 uses obsolete binary /speaker
[main] #3618517 is a child of #3618210
[main] #1595 exe => /usr/bin/containerd-shim-runc-v2
[main] #1595 is containerd.service
[main] #1717 exe => /usr/bin/containerd-shim-runc-v2
[main] #1717 is containerd.service
[main] #1967 exe => /usr/bin/containerd-shim-runc-v2
[main] #1967 is containerd.service
[main] #2287 exe => /usr/bin/containerd-shim-runc-v2
[main] #2287 is containerd.service
[main] #493878 exe => /usr/bin/containerd-shim-runc-v2
[main] #493878 is containerd.service
[main] #1075912 exe => /usr/bin/containerd-shim-runc-v2
[main] #1075912 is containerd.service
[main] #1076829 exe => /usr/bin/containerd-shim-runc-v2
[main] #1076829 is containerd.service
[main] #1138779 exe => /usr/bin/containerd-shim-runc-v2
[main] #1138779 is containerd.service
[main] #1145965 exe => /usr/bin/containerd-shim-runc-v2
[main] #1145965 is containerd.service
[main] #1146226 exe => /usr/bin/containerd-shim-runc-v2
[main] #1146226 is containerd.service
[main] #1210406 exe => /usr/bin/containerd-shim-runc-v2
[main] #1210406 is containerd.service
[main] #1950027 exe => /usr/bin/containerd-shim-runc-v2
[main] #1950027 is containerd.service
[main] #1951314 exe => /usr/bin/containerd-shim-runc-v2
[main] #1951314 is containerd.service
[main] #1951378 exe => /usr/bin/containerd-shim-runc-v2
[main] #1951378 is containerd.service
[main] #1962241 exe => /usr/bin/containerd-shim-runc-v2
[main] #1962241 is containerd.service
[main] #1970221 exe => /usr/bin/containerd-shim-runc-v2
[main] #1970221 is containerd.service
[main] #2461036 exe => /usr/bin/containerd-shim-runc-v2
[main] #2461036 is containerd.service
[main] #2844466 exe => /usr/bin/containerd-shim-runc-v2
[main] #2844466 is containerd.service
[main] #3618210 exe => /usr/bin/containerd-shim-runc-v2
[main] #3618210 is containerd.service
[main] inside container or vm, skipping microcode checks
[Kernel] Linux: kernel release 5.15.0-122-generic, kernel version #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024
Failed to load NeedRestart::Kernel::kFreeBSD: [Kernel/kFreeBSD] Not running on GNU/kFreeBSD!
[Kernel/Linux] /boot/vmlinuz.old => 5.15.0-119-generic (buildd@lcy02-amd64-075) #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024 [5.15.0-119-generic]
[Kernel/Linux] /boot/vmlinuz-5.15.0-122-generic => 5.15.0-122-generic (buildd@lcy02-amd64-034) #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 [5.15.0-122-generic]*
[Kernel/Linux] /boot/vmlinuz-5.15.0-119-generic => 5.15.0-119-generic (buildd@lcy02-amd64-075) #129-Ubuntu SMP Fri Aug 2 19:25:20 UTC 2024 [5.15.0-119-generic]
[Kernel/Linux] /boot/vmlinuz => 5.15.0-122-generic (buildd@lcy02-amd64-034) #132-Ubuntu SMP Thu Aug 29 13:45:52 UTC 2024 [5.15.0-122-generic]*
[Kernel/Linux] Expected linux version: 5.15.0-122-generic
Running kernel seems to be up-to-date.
Services to be restarted:
systemctl restart containerd.service
No containers need to be restarted.
No user sessions are running outdated binaries.
No VM guests are running outdated hypervisor (qemu) binaries on this host.
# ls -lha /proc/493878/exe
lrwxrwxrwx 1 root root 0 Oct 31 23:12 /proc/493878/exe -> /usr/bin/containerd-shim-runc-v2
# ls -lha /proc/493899/exe
lrwxrwxrwx 1 65535 65535 0 Oct 31 23:12 /proc/493899/exe -> /pause
I picked just a random process, as I wasn't sure what $PID
you're referring to @liske
Thanks @zerkms !
Tested on Debian bookworm with git HEAD after spawning a docker container (with containerd) but I was not able to reproduce it, yet 🤔
Looking at all the downstream bugs and your bisect:
- seems to be related to a various type of containers (containerd, lxc)
- seems not to trigger on all containers (setups) - still looking for a minimal reproducible example
- Debian #1088012 reports for needrestart 3.7 on trixie, so it seems to affect all releases with the patch
after spawning a docker container
what process did you start as a "root" process of the container? Any chance path to that process exists on the host filesystem hence it looks "okay"?
UPDATE: okay, with just docker I cannot reproduce it either (and what is also unfortunate - I never in my life coded in perl so cannot give a hand in debugging it locally).
But it looks like running it through docker triggers some other internal machinery:
[main] #38899 uses obsolete binary /usr/bin/dumb-init
[docker] #38899 is part of docker container '9693c8c0416b5334bf6bfa87469270ab22c4f88c3fb8bca943fde0dd29798780' and will be ignored
So it knows about docker. But possibly it does not recognise it as a containerised process when it's naked containerd/runc?
okay, here is a repro with containerd (from a host that runs docker):
# ctr i pull docker.io/library/nginx:latest
# ctr run --rm docker.io/library/nginx:latest test
second terminal:
# needrestart -rl -p
WARN - Kernel: 6.8.0-48-generic, Services: 1 (!), Containers: none, Sessions: none|Kernel=0;0;;0;2 Services=1;;0;0 Containers=0;;0;0 Sessions=0;0;;0
Services:
- containerd.service
Relevant output:
[main] #892340 uses obsolete binary /usr/sbin/nginx
[main] #892340 is a child of #892320
[main] #892374 uses obsolete binary /usr/sbin/nginx
[main] #892374 is a child of #892340
[main] #892375 uses obsolete binary /usr/sbin/nginx
[main] #892375 is a child of #892340
(just ensure that the host filesystem does not have /usr/sbin/nginx
)
(just ensure that the host filesystem does not have
/usr/sbin/nginx
)
🤦 This is the trigger! Proc::ProcessTable does not provide a value in the exec
field (line 533) if the exe
symlink points to a non-existing file in the filesystem where needrestart is called (it does not check in /proc/$PID/root
).
I need to take a look at how to deal with this as the patch prevents a race condition. Sorry it's too late for me to look at it just now.
I have the same problem, but the non-existing file in the host filesystem is not the distinguishing feature.
Both examples are from inside a docker container. This is the obsolete binary that triggers a restart:
[main] #994 uses obsolete binary /usr/bin/python3.10
[main] #994 is a child of #943
but this does not (and does not look for the parent, but detects the container):
[main] #1235 uses obsolete binary /usr/sbin/nginx
[docker] #1235 is part of docker container 'e5f78a11696b5454d470ddc8fa47ef19481a6caa1d91132594f309adb777afa0' and will be ignored
Both binaries are not part of the host filessystem:
# ls -l /usr/bin/python3.10
ls: cannot access '/usr/bin/python3.10': No such file or directory
# ls -l /usr/sbin/nginx
ls: cannot access '/usr/sbin/nginx': No such file or directory
The host system is debian 11 in a VM on proxmox and needrestart
has been updated to 3.5-4+deb11u4
on 2024-11-20. I have patched the docker.pm
with #234 to fix the cgroup v2 problem.
I'll try to reproduce the problem, but until then, I don't change anything on this machine and could help with any data that you need for debugging.
Some more information since I'm able to reproduce the problem. I have pinned the docker packages from the docker repo, since the vm is running some legacy images (see at the end for the versions).
Now I can reproduce the problem with this docker command:
docker run -it --rm --pid host nicolargo/glances python3
The important switch is --pid host
and the started app must be python3
not /bin/sh
. Maybe this will give you enough information for the fix. I will try to reproduce with a clean debian 11 vm, if possible, but not today.
The docker packages used are:
# apt list --upgradable
containerd.io/bullseye 1.7.23-1 amd64 [upgradable from: 1.6.27-1]
docker-buildx-plugin/bullseye 0.17.1-1~debian.11~bullseye amd64 [upgradable from: 0.11.2-1~debian.11~bullseye]
docker-ce-cli/bullseye 5:27.3.1-1~debian.11~bullseye amd64 [upgradable from: 5:24.0.7-1~debian.11~bullseye]
docker-ce-rootless-extras/bullseye 5:27.3.1-1~debian.11~bullseye amd64 [upgradable from: 5:24.0.7-1~debian.11~bullseye]
docker-ce/bullseye 5:27.3.1-1~debian.11~bullseye amd64 [upgradable from: 5:24.0.7-1~debian.11~bullseye]
docker-compose-plugin/bullseye 2.29.7-1~debian.11~bullseye amd64 [upgradable from: 2.21.0-1~debian.11~bullseye]
Test with a clean debian 12 VM shows the same with current updates and needrestart 3.6-4+deb12u2:
# needrestart -r l -v
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v3.6
[main] running in root mode
[Core] Using UI 'NeedRestart::UI::stdio'...
[main] systemd detected
[main] vm detected
[Core] #591 is a NeedRestart::Interp::Python
[Python] #591: source=/usr/share/unattended-upgrades/unattended-upgrade-shutdown
[main] #1971661 uses obsolete binary /usr/bin/python3.12
[main] #1971661 is a child of #1971642
[main] #1971642 exe => /usr/bin/containerd-shim-runc-v2
[main] #1971642 is containerd.service
(just ensure that the host filesystem does not have
/usr/sbin/nginx
)🤦 This is the trigger! Proc::ProcessTable does not provide a value in the
exec
field (line 533) if theexe
symlink points to a non-existing file in the filesystem where needrestart is called (it does not check in/proc/$PID/root
).I need to take a look at how to deal with this as the patch prevents a race condition. Sorry it's too late for me to look at it just now.
The suspect line line 533 was added to avoid unnecessary further tests. This is wrong because of of the way the Proc::ProcessTable module works. Dropping the line fixes the example provided by @zerkms on my host.
I will do some more testing and review before merging this into he master branch. Feel free to give 42af5d3 a try and report back here, thanks!
The change from 42af5d3 fixes the problem both for my debian 11 and debian 12 occurences. Thank you!
FWIW, I have uploaded for Debian unstable https://tracker.debian.org/news/1588733/accepted-needrestart-37-32-source-into-unstable/ and wanted to give it a bit more exposure before doing a regression update.
I see you just ammended the commit with 63c0f1b so I guess it's worth waiting.
@liske I got a downstream report that LXC containers still are restarted. I asked if we get in minimally reproduced. Do you want to have this entangled form this report? It is stil seems to be a regressio from the CVE-2024-48991 so far.
Context: https://bugs.debian.org/1088047#54
From my side I want to confirm that the latest patch (ubuntu packaged) has fixed my originally reported problem.
Thanks to all who participated and resolved it, and @liske personally.
PS: Would we close this ticket and create one more for lxc? (I don't mind either way though)
@liske I got a downstream report that LXC containers still are restarted. I asked if we get in minimally reproduced. Do you want to have this entangled form this report? It is stil seems to be a regressio from the CVE-2024-48991 so far.
Context: https://bugs.debian.org/1088047#54
The 42af5d3 patch (needrestart 3.7-3.2 in Debian unstable) had a problem when a processes is in another mountns - resulting in the LXC false positives. After having some more time for reviewing and with feedback from Qualys (thanks again!) the patch has been withdrawn and replaced by 63c0f1b. This seems to be in a good shape and will be merged into the main
branch.
(In case you wonder why the fix is more complicated…)
There are two values used for the analysis of the process binary in needrestart:
- the exec value for the PID from Proc::ProccessTable
- has been parsed longer time ago when needrestart was started
- can be broken (
undef
) for various reasons (replaced binary, mountns) - has pointed to an existing binary in the root mountns at least once a time (else it would be
undef
)
- the readlink value on
/proc/$PID/exe
(the$exe
variable in the patch):
- retrieved later in needrestart's analysis loop
- required to detect if the binary was replaced
- also works for processes inside mountns (read: containers)
- can be fragile (
(deleted)
suffix etc.)
CVE-2024-48991 was possible because $exe
and the value of exec
where considered to be always equal which was not guaranteed. The broken patch 42af5d3 allowed that a undef
value in exec
was used for the analysis still resulting in false positives (for containers).