Cleanup created containers on `receptor work cancel`

Question

Cleanup created containers on `receptor work cancel`

chrismeyersfsu opened this issue 3 years ago · 5 comments

Background

Stopping the underlying container when receptor work cancel is ran is showing to be harder than we thought due to the underlyiing pod running in a separate process tree.

Let us look at the two process trees and reason how and if we can reliably kill the disconnected podman tree from the receptor tree.

Below we have sleep.yml running on an execution node.

Note that we effectively have 2 process trees here. Let's give the two process trees a name.

receptor tree
podman tree

ps  xafo pid,ppid,pgid,sid,tname,tpgid,command | cat

    PID    PPID    PGID     SID TTY        TPGID COMMAND
  33280       1   33280   33280 ?             -1 /usr/bin/receptor -c /etc/receptor/receptor.conf
  48006   33280   48006   48006 ?             -1  \_ /usr/bin/receptor --log-level info --command-runner command=ansible-runner params=worker --private-data-dir=/tmp/awx_3_t5euoymz --delete unitdir=/tmp/receptor/ec2-54-166-76-213.compute-1.amazonaws.com/qbWl7jHt
  48013   48006   48006   48006 ?             -1      \_ /usr/bin/python3.8 /usr/bin/ansible-runner worker --private-data-dir=/tmp/awx_3_t5euoymz --delete
  48021   48013   48021   48021 pts/1      48021          \_ /usr/bin/podman run --rm --tty --interactive --workdir /runner/project -v /tmp/awx_3_t5euoymz/:/runner/:Z --authfile=/tmp/ansible_runner_registry_3_vmglxwxo/auth.json --env-file /tmp/awx_3_t5euoymz/artifacts/3/env.list --quiet --name ansible_runner_3 --user=root brew.registry.redhat.io/rh-osbs/ansible-automation-platform-21-ee-supported-rhel8:latest ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
  48030   48021   48021   48021 pts/1      48021              \_ /usr/bin/podman run --rm --tty --interactive --workdir /runner/project -v /tmp/awx_3_t5euoymz/:/runner/:Z --authfile=/tmp/ansible_runner_registry_3_vmglxwxo/auth.json --env-file /tmp/awx_3_t5euoymz/artifacts/3/env.list --quiet --name ansible_runner_3 --user=root brew.registry.redhat.io/rh-osbs/ansible-automation-platform-21-ee-supported-rhel8:latest ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
  48044   48030   48044   48021 pts/1      48021                  \_ /usr/bin/slirp4netns --disable-host-loopback --mtu=65520 --enable-sandbox --enable-seccomp -c -e 3 -r 4 --netns-type=path /tmp/podman-run-1001/netns/cni-a60bb3df-e186-f321-43b1-c45643545924 tap0


...

  47819       1   47818   47818 ?             -1 /usr/bin/podman
  48042       1   48042   48042 ?             -1 /usr/bin/fuse-overlayfs -o ,lowerdir=/var/lib/awx/.local/share/containers/storage/overlay/l/BPP3QIXXEKFJBOOIQXOC6XA527:/var/lib/awx/.local/share/containers/storage/overlay/l/XXSETG5Q6N3IYWXF2MK33TFGKV:/var/lib/awx/.local/share/containers/storage/overlay/l/YFKXZRQGTWXOPZA2TJZ7DMOIYK:/var/lib/awx/.local/share/containers/storage/overlay/l/ZECWG32J2U3WUPPEMCROCEOZZF:/var/lib/awx/.local/share/containers/storage/overlay/l/BM5O62ZKZFURIQOOAFOHBUU3X4,upperdir=/var/lib/awx/.local/share/containers/storage/overlay/5495a4bf931b93dfc9fb31889c54a5fcc6228a13efb7fb4683c5f8ef4372e48e/diff,workdir=/var/lib/awx/.local/share/containers/storage/overlay/5495a4bf931b93dfc9fb31889c54a5fcc6228a13efb7fb4683c5f8ef4372e48e/work,volatile,context="system_u:object_r:container_file_t:s0:c526,c593" /var/lib/awx/.local/share/containers/storage/overlay/5495a4bf931b93dfc9fb31889c54a5fcc6228a13efb7fb4683c5f8ef4372e48e/merged
  48047       1   48047   48047 ?             -1 /usr/bin/conmon --api-version 1 -c d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c -u d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c -r /usr/bin/crun -b /var/lib/awx/.local/share/containers/storage/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata -p /tmp/podman-run-1001/containers/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/pidfile -n ansible_runner_3 --exit-dir /tmp/run-1001/libpod/tmp/exits --full-attach -l k8s-file:/var/lib/awx/.local/share/containers/storage/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/ctr.log --log-level warning --runtime-arg --log-format=json --runtime-arg --log --runtime-arg=/tmp/podman-run-1001/containers/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/oci-log -t --conmon-pidfile /tmp/podman-run-1001/containers/overlay-containers/d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /var/lib/awx/.local/share/containers/storage --exit-command-arg --runroot --exit-command-arg /tmp/podman-run-1001/containers --exit-command-arg --log-level --exit-command-arg warning --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /tmp/run-1001/libpod/tmp --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg overlay --exit-command-arg --storage-opt --exit-command-arg overlay.mount_program=/usr/bin/fuse-overlayfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg d37baddb89438f24a88c454beb96c8b50f29fc81c6e47b3590a7aff6d44e6f0c
  48050   48047   48050   48050 ?             -1  \_ /usr/bin/dumb-init -- ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
  48071   48050   48071   48071 pts/0      48071      \_ /usr/bin/python3.8 /usr/bin/ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
  48077   48071   48071   48071 pts/0      48071          \_ /usr/bin/python3.8 /usr/bin/ansible-playbook -u root -i /runner/inventory/hosts -e @/runner/env/extravars sleep.yml
  48088   48077   48071   48071 pts/0      48071              \_ /bin/sh -c /usr/bin/python3.8 /root/.ansible/tmp/ansible-tmp-1633971632.6541889-26-153245270598929/AnsiballZ_command.py && sleep 0
  48089   48088   48071   48071 pts/0      48071                  \_ /usr/bin/python3.8 /root/.ansible/tmp/ansible-tmp-1633971632.6541889-26-153245270598929/AnsiballZ_command.py
  48090   48089   48071   48071 pts/0      48071                      \_ /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 30000

POSIX Processes Notes

if pid == sid then the process is the session leader
Upon process exit, if pid == sid == tpgid then send SIGHUP to all processes in TPGID
Parent USUALLY doesn't sent a signal to children when parent exits
Orphan process is when parent exits() but child process remains running
- init process becomes the parent
- still running
Zombie process is when parent didn't wait() on child process
- child process is always a zombie for a brief period
- Zombie processes do nothing but occupy pid space
Best practice for a parent to wait() on a child
A process can NOT migrate sessions
- A process may not create a process group that belongs to another session
- A process is not permitted to join a process group that is a member of another session
- Can NOT change PGID of a session leader
kill(-PGID) to send a signal to all processes in a PGID

Receptor Tree

receptor daemon
\_ receptor worker
   \_ ansible-runner worker
      \_ podman run parent
         \_ podman run child

Process	PID	PPID	PGID	SID	TPGID
receptor daemon	33280	1	33280	33280	-1
receptor worker	48006	33280	48006	48006	-1	session leader
ansible-runner worker	48013	48006	48006	48006	-1
podman run parent	48021	48013	48021	48021	48021	session leader
podman run child	48030	48021	48021	48021	48021

Podman Tree

podman fuse-overlayfs
conmon
\_ dumb-init ansible-playbook
   \_ ... (ansible-playbook worker processes)

Process	PID	PPID	PGID	SID	TPGID
podman fuse-overlayfs	48042	1	48042	48042	-1
conmon	48047	1	48047	48047	-1
dumb-init	48050	48047	48050	48050	-1
ansible-playbook worker processes	48071	48050	48071	48071	48071
ansible-playbook worker processes	48077	48071	48071	48071	48071
ansible-playbook worker processes	48088	48077	48071	48071	48071
...

Playing Around

Let's try sending signals and see what happens.

Process Signals

Signal	Process	Result
`SIGINT`	`ansible-runner-worker` `podman run parent` `podman run child`	stuck pod
`SIGTERM`	`ansible-runner-worker` `podman run parent`	stuck pod
`SIGTERM`	`podman run child`	pod cleaned up (flaky, see error below)

Alright, we found a case where the podman run process will clean-up the podman tree. However, podman run child isn't really known to receptor nor ansible-runner worker. Can we send a signal higher up in the process tree to a process something we manager knows about?

podman run parent and podman run child are in the same process group. ansible-runner may not know about podman run child but it knows about podman run parent so maybe it could seis killed too.nd a signal to the process group.

Note: Seeing the below error sometimes. This may be a podman bug.

ERRO[0066] Error forwarding signal 15 to container 86fcc1ed7d74f16be34d5bc7f25acdfed71fec206194b6e1a393e1b6c7a2ab40: error sending signal to container 86fcc1ed7d74f16be34d5bc7f25acdfed71fec206194b6e1a393e1b6c7a2ab40: `/usr/bin/crun kill 86fcc1ed7d74f16be34d5bc7f25acdfed71fec206194b6e1a393e1b6c7a2ab40 15` failed: signal: hangup

Process Group Signals

Signal	Process Group	Result
`SIGTERM`	`-podman run parent`	stuck pod

Nope, that didn't work hmm MAYBE because of the error discovered above?

Next theory,

Thoughts

Even if we could get the right signal to be sent to the right process in the receptor tree, there is the edge case where all or part of the receptor tree could be SIGKILL due to a number of reasons (OOM being the most common). There are no kernel level facilities that I can think of that we can lean on to ensure that when a process exists in the receptor tree, that the podman process tree exits also. Therefore, the only option that I see is to rely on Receptor to maintain some sort of state about the podman process tree and to cleanup the podman process tree from Receptor.

Proposal

Receptor will pass the receptor work directory to ansbile-runner worker. ansible-runner worker --container-pid-file-dir=/tmp/receptor/awx_1/KcipiXHj/pidfiles/ Ansible runner worker will then forward to podman the location to write the pid file via podman run --conmon-pidfile=/tmp/receptor/awx_1/KcipiXHj/pidfiles/conmon.pid. Receptor will be responsible for sending SIGKILL when receptor work cancel KcipiXHj is issued, to any pid in the /tmp/receptor/awx_1/KcipiXHj/pidfiles/ dir.

Answer 1 · 2021-10-13T14:57:19.000Z

Replaced instances of the word "pod" with "container" to avoid confusion.

Answer 2 · 2021-10-13T15:00:08.000Z

I am not sure about passing the pid directory via a command line argument like this.

It seems like it would require some sort of templating / variable substitution in the receptor config.

It may be less intrusive to inject an environment variable like RECEPTOR_PID_DIRECTORY and check for that within Runner.

Answer 3 · 2021-10-13T18:03:01.000Z

It may be less intrusive to inject an environment variable like RECEPTOR_PID_DIRECTORY and check for that within Runner.

That would have to be unique for each work unit, right? That's the weird thing to me. Otherwise, receptor may have multiple work units, some running, and some not. It needs to know which pid file came from which work unit, because it only uses this pidfile when it gets the cancel command for some specific work unit.

Answer 4 · 2021-10-13T20:58:25.000Z

We can't reasonably start on a change to ansible-runner right now, because we would be throwing around speculation about what receptor would provide.

receptor

With the above proposal, we want receptor to move first, establishing (in a PoC) the environment variable that will be set, and instructions about how to write a pid file in a way that will map the pid file to the work unit id. Might the code need to create the directory? Will receptor clean up old directories?

runner

Then we can develop a PoC inside of runner that will read that environment variable and pass an option to podman which will create the pid file. @chrismeyersfsu would it pass only --conmon-pidfile, or would it pass --pidfile as well?

(env) [alancoding@alan-red-hat awx]$ podman run --help | grep "pidfil"
      --conmon-pidfile string                    Path to the file that will receive the PID of conmon
      --pidfile string                           Write the container process ID to the file

We may need to confirm these options work the same in other podman versions.

awx

Or could we just trash both of those ideas and pass the new pidfile option(s) in container_options like:

https://github.com/ansible/awx/blob/60a357eda167c2c010a76e5073dea532a8342e72/awx/main/tasks.py#L969

Then AWX would have to do some extra cleanup (when?).

Answer 5 · 2022-01-14T18:27:39.000Z

I dont think this is a problem anymore. @AlanCoding says we have tests for this.