Cleanup created containers on `receptor work cancel`
chrismeyersfsu opened this issue · 5 comments
Background
Stopping the underlying container when receptor work cancel
is ran is showing to be harder than we thought due to the underlyiing pod running in a separate process tree.
Let us look at the two process trees and reason how and if we can reliably kill the disconnected podman tree from the receptor tree.
Below we have sleep.yml
running on an execution node.
Note that we effectively have 2 process trees here. Let's give the two process trees a name.
- receptor tree
- podman tree
ps xafo pid,ppid,pgid,sid,tname,tpgid,command | cat
PID PPID PGID SID TTY TPGID COMMAND
33280 1 33280 33280 ? -1 /usr/bin/receptor -c /etc/receptor/receptor.conf
48006 33280 48006 48006 ? -1 \_ /usr/bin/receptor --log-level info --command-runner command=ansible-runner params=worker --private-data-dir=/tmp/awx_3_t5euoymz --delete unitdir=/tmp/receptor/ec2-54-166-76-213.compute-1.amazonaws.com/qbWl7jHt
48013 48006 48006 48006 ? -1 \_ /usr/bin/python3.8 /usr/bin/ansible-runner worker --private-data-dir=/tmp/awx_3_t5euoymz --delete
48021 48013 48021 48021 pts/1 48021 \_ /usr/bin/podman run --rm --tty --interactive --workdir /runner/project -v /tmp/awx_3_t5euoymz/:/runner/:Z --authfile=/tmp/ansible_runner_registry_3_vmglxwxo/auth.json --env-file /tmp/awx_3_t5euoymz/artifacts/3/env.list --quiet --name ansible_runner_3 --user=root brew.registry.redhat.io/rh-osbs/ansible-automation-platform-21-ee-supported-rhel8:latest ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
48030 48021 48021 48021 pts/1 48021 \_ /usr/bin/podman run --rm --tty --interactive --workdir /runner/project -v /tmp/awx_3_t5euoymz/:/runner/:Z --authfile=/tmp/ansible_runner_registry_3_vmglxwxo/auth.json --env-file /tmp/awx_3_t5euoymz/artifacts/3/env.list --quiet --name ansible_runner_3 --user=root brew.registry.redhat.io/rh-osbs/ansible-automation-platform-21-ee-supported-rhel8:latest ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
48044 48030 48044 48021 pts/1 48021 \_ /usr/bin/slirp4netns --disable-host-loopback --mtu=65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /tmp/podman-run-1001/netns/cni-a60bb3df-e186-f321-43b1-c45643545924 tap0
...
47819 1 47818 47818 ? -1 /usr/bin/podman
48042 1 48042 48042 ? -1 /usr/bin/fuse-overlayfs -o ,lowerdir=/var/lib/awx/.local/share/containers/storage/overlay/l/BPP3QIXXEKFJBOOIQXOC6XA527:/var/lib/awx/.local/share/containers/storage/overlay/l/XXSETG5Q6N3IYWXF2MK33TFGKV:/var/lib/awx/.local/share/containers/storage/overlay/l/YFKXZRQGTWXOPZA2TJZ7DMOIYK:/var/lib/awx/.local/share/containers/storage/overlay/l/ZECWG32J2U3WUPPEMCROCEOZZF:/var/lib/awx/.local/share/containers/storage/overlay/l/BM5O62ZKZFURIQOOAFOHBUU3X4,upperdir=/var/lib/awx/.local/share/containers/storage/overlay/5495a4bf931b93dfc9fb31889c54a5fcc6228a13efb7fb4683c5f8ef4372e48e/diff,workdir=/var/lib/awx/.local/share/containers/storage/overlay/5495a4bf931b93dfc9fb31889c54a5fcc6228a13efb7fb4683c5f8ef4372e48e/work,volatile,context="system_u:object_r:container_file_t:s0:c526,c593" /var/lib/awx/.local/share/containers/storage/overlay/5495a4bf931b93dfc9fb31889c54a5fcc6228a13efb7fb4683c5f8ef4372e48e/merged
48047 1 48047 48047 ? -1 /usr/bin/conmon --api-version 1 -c d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c -u d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c -r /usr/bin/crun -b /var/lib/awx/.local/share/containers/storage/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata -p /tmp/podman-run-1001/containers/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/pidfile -n ansible_runner_3 --exit-dir /tmp/run-1001/libpod/tmp/exits --full-attach -l k8s-file:/var/lib/awx/.local/share/containers/storage/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/ctr.log --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/tmp/podman-run-1001/containers/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/oci-log -t --conmon-pidfile /tmp/podman-run-1001/containers/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/awx/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /tmp/podman-run-1001/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /tmp/run-1001/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c
48050 48047 48050 48050 ? -1 \_ /usr/bin/dumb-init -- ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
48071 48050 48071 48071 pts/0 48071 \_ /usr/bin/python3.8 /usr/bin/ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
48077 48071 48071 48071 pts/0 48071 \_ /usr/bin/python3.8 /usr/bin/ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
48088 48077 48071 48071 pts/0 48071 \_ /bin/sh -c /usr/bin/python3.8 /root/.ansible/tmp/ansible-tmp-1633971632.6541889-26-153245270598929/AnsiballZ_command.py && sleep 0
48089 48088 48071 48071 pts/0 48071 \_ /usr/bin/python3.8 /root/.ansible/tmp/ansible-tmp-1633971632.6541889-26-153245270598929/AnsiballZ_command.py
48090 48089 48071 48071 pts/0 48071 \_ /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 30000
POSIX Processes Notes
- if pid == sid then the process is the session leader
- Upon process exit, if pid == sid == tpgid then send
SIGHUP
to all processes in TPGID - Parent USUALLY doesn't sent a signal to children when parent exits
- Orphan process is when parent exits() but child process remains running
- init process becomes the parent
- still running
- Zombie process is when parent didn't wait() on child process
- child process is always a zombie for a brief period
- Zombie processes do nothing but occupy pid space
- Best practice for a parent to wait() on a child
- A process can NOT migrate sessions
- A process may not create a process group that belongs to another session
- A process is not permitted to join a process group that is a member of another session
- Can NOT change PGID of a session leader
- kill(-PGID) to send a signal to all processes in a PGID
Receptor Tree
receptor daemon
\_ receptor worker
\_ ansible-runner worker
\_ podman run parent
\_ podman run child
Process | PID | PPID | PGID | SID | TPGID | |
---|---|---|---|---|---|---|
receptor daemon | 33280 | 1 | 33280 | 33280 | -1 | |
receptor worker | 48006 | 33280 | 48006 | 48006 | -1 | session leader |
ansible-runner worker | 48013 | 48006 | 48006 | 48006 | -1 | |
podman run parent | 48021 | 48013 | 48021 | 48021 | 48021 | session leader |
podman run child | 48030 | 48021 | 48021 | 48021 | 48021 |
Podman Tree
podman fuse-overlayfs
conmon
\_ dumb-init ansible-playbook
\_ ... (ansible-playbook worker processes)
Process | PID | PPID | PGID | SID | TPGID | |
---|---|---|---|---|---|---|
podman fuse-overlayfs | 48042 | 1 | 48042 | 48042 | -1 | |
conmon | 48047 | 1 | 48047 | 48047 | -1 | |
dumb-init | 48050 | 48047 | 48050 | 48050 | -1 | |
ansible-playbook worker processes | 48071 | 48050 | 48071 | 48071 | 48071 | |
ansible-playbook worker processes | 48077 | 48071 | 48071 | 48071 | 48071 | |
ansible-playbook worker processes | 48088 | 48077 | 48071 | 48071 | 48071 | |
... |
Playing Around
Let's try sending signals and see what happens.
Process SignalsSignal | Process | Result |
---|---|---|
SIGINT |
ansible-runner-worker podman run parent podman run child |
stuck pod |
SIGTERM |
ansible-runner-worker podman run parent |
stuck pod |
SIGTERM |
podman run child |
pod cleaned up (flaky, see error below) |
Alright, we found a case where the podman run
process will clean-up the podman tree. However, podman run child
isn't really known to receptor nor ansible-runner worker
. Can we send a signal higher up in the process tree to a process something we manager knows about?
podman run parent
and podman run child
are in the same process group. ansible-runner
may not know about podman run child
but it knows about podman run parent
so maybe it could seis killed too.nd a signal to the process group.
Note: Seeing the below error sometimes. This may be a podman bug.
ERRO[0066] Error forwarding signal 15 to container 86fcc1ed7d74f16be34d5bc7f25acdfed71fec206194b6e1a393e1b6c7a2ab40: error sending signal to container 86fcc1ed7d74f16be34d5bc7f25acdfed71fec206194b6e1a393e1b6c7a2ab40: `/usr/bin/crun kill 86fcc1ed7d74f16be34d5bc7f25acdfed71fec206194b6e1a393e1b6c7a2ab40 15` failed: signal: hangup
Signal | Process Group | Result |
---|---|---|
SIGTERM |
-podman run parent |
stuck pod |
Nope, that didn't work hmm MAYBE because of the error discovered above?
Next theory,
Thoughts
Even if we could get the right signal to be sent to the right process in the receptor tree, there is the edge case where all or part of the receptor tree could be SIGKILL
due to a number of reasons (OOM being the most common). There are no kernel level facilities that I can think of that we can lean on to ensure that when a process exists in the receptor tree, that the podman process tree exits also. Therefore, the only option that I see is to rely on Receptor to maintain some sort of state about the podman process tree and to cleanup the podman process tree from Receptor.
Proposal
Receptor will pass the receptor work directory to ansbile-runner worker
. ansible-runner worker --container-pid-file-dir=/tmp/receptor/awx_1/KcipiXHj/pidfiles/
Ansible runner worker will then forward to podman the location to write the pid file via podman run --conmon-pidfile=/tmp/receptor/awx_1/KcipiXHj/pidfiles/conmon.pid
. Receptor will be responsible for sending SIGKILL
when receptor work cancel KcipiXHj
is issued, to any pid in the /tmp/receptor/awx_1/KcipiXHj/pidfiles/
dir.
Replaced instances of the word "pod" with "container" to avoid confusion.
I am not sure about passing the pid directory via a command line argument like this.
It seems like it would require some sort of templating / variable substitution in the receptor config.
It may be less intrusive to inject an environment variable like RECEPTOR_PID_DIRECTORY
and check for that within Runner.
It may be less intrusive to inject an environment variable like
RECEPTOR_PID_DIRECTORY
and check for that within Runner.
That would have to be unique for each work unit, right? That's the weird thing to me. Otherwise, receptor may have multiple work units, some running, and some not. It needs to know which pid file came from which work unit, because it only uses this pidfile when it gets the cancel command for some specific work unit.
We can't reasonably start on a change to ansible-runner right now, because we would be throwing around speculation about what receptor would provide.
receptor
With the above proposal, we want receptor to move first, establishing (in a PoC) the environment variable that will be set, and instructions about how to write a pid file in a way that will map the pid file to the work unit id. Might the code need to create the directory? Will receptor clean up old directories?
runner
Then we can develop a PoC inside of runner that will read that environment variable and pass an option to podman
which will create the pid file. @chrismeyersfsu would it pass only --conmon-pidfile
, or would it pass --pidfile
as well?
(env) [alancoding@alan-red-hat awx]$ podman run --help | grep "pidfil"
--conmon-pidfile string Path to the file that will receive the PID of conmon
--pidfile string Write the container process ID to the file
We may need to confirm these options work the same in other podman versions.
awx
Or could we just trash both of those ideas and pass the new pidfile option(s) in container_options
like:
https://github.com/ansible/awx/blob/60a357eda167c2c010a76e5073dea532a8342e72/awx/main/tasks.py#L969
Then AWX would have to do some extra cleanup (when?).
I dont think this is a problem anymore. @AlanCoding says we have tests for this.