[worker] `podman system service` process accumulation
duckinator opened this issue ยท 11 comments
When running cirrus worker run
with the container isolation on Debian 12, with the Podman backend, each task spawns a podman system service
process that never exits.
At one point, the system had accumulated 158 of these processes, but hadn't run any tasks for over 3 hours.
WORKAROUND: Periodically close and re-opening cirrus worker run
.
This line is what runs the process that doesn't exit:
However, my understanding is that this function should eventually be called to kill it:
cirrus-cli/internal/executor/instance/containerbackend/podman_linux.go
Lines 92 to 121 in 95e9f68
I'm unsure whether this function isn't being called, or if it's just not working for some reason.
I changed "-t", "0"
to "-t", "1"
and it spawns processes that eventually turn into zombie processes.
This seems to confirm that, for whatever reason, the podman system service
processes are neither being waited on nor killed.
I've tried reproducing your issue on Cirrus CLI 0.122.0 and Podman 3.4.4 on a clean ghcr.io/cirruslabs/ubuntu:latest
instance with the following configuration to no avail:
container:
image: debian:latest
task:
script: uname -a
#767 might help, though.
I'll test that PR later today and let you know. ๐
For future reference on my part, are you running ghcr.io/cirruslabs/ubuntu:latest
via vetu, or something else?
I'm running it via Tart on macOS, but that probably doesn't matter much for this reproduce attempt.
@duckinator how does your .cirrus.yml
look like? Which Cirrus CLI and Podman versions do you run and on which Linux distribution?
We'll probably need to reproduce this somehow first in order to devise a fix (if it's needed at all).
Looking at the code it should cleanup an instance or throw an error:
cirrus-cli/internal/executor/executor.go
Lines 208 to 217 in 2a07c02
Versions and such:
- distro: Debian 12 (bookworm)
- cirrus: 0.122.0-95e9f68
- podman: 4.3.1
The config I'm using on the worker is:
token: "[...]"
security:
allowed-isolations:
container: {}
resources:
cpu: 2
log:
level: debug
The .cirrus.yml
that triggers the worker is: https://github.com/duckinator/bork/blob/184e2c646d521bdfe8adef40c94082787e090944/.cirrus.yml (note that macOS_task
, FreeBSD_task
, and Windows_task
are currently marked as skipped).
Please check out the new 0.122.1
release that will be available shortly, it should fix the issue you're encountering ๐
Unfortunately with cirrus-cli 0.122.1-8ae0752, it's actually worse: the problem is still there, but now the podman processes linger even after I stop cirrus worker run
. Previously, that was making it exit.
Unfortunately with cirrus-cli 0.122.1-8ae0752, it's actually worse: the problem is still there, but now the podman processes linger even after I stop
cirrus worker run
. Previously, that was making it exit.
Indeed, I was testing the fix using cirrus run
, yet, Persistent Worker has slightly different code path.
This will be fixed in #769.
Sorry for the inconvenience!
Confirmed that using 0.122.2-6faa293 works. Thank you for fixing this so quickly, it's very appreciated!