buildkite-plugins/docker-buildkite-plugin

Cancelled jobs leave containers running

lox opened this issue · 3 comments

lox commented

We've had reports that long running scripts run in the docker plugin leave the container running when the job is cancelled.

I verified this with this gist: https://gist.github.com/lox/045a4b56a0c1e1c815fd011657c34b46/708574c0b0ff9ec8748d2cd736de31951da555cb

The output looks like:
image

For some context, the agent initiates cancellation and sends a SIGTERM to the process group of buildkite-agent bootstrap that is executing the job.

It appears that docker run doesn't like being SIGTERM'd and terminates without stopping the container. My previous understanding was that docker run would proxy signals through to the container, and whilst there were some caveats around how pid 1 operates, it should be ok with the --init flag or a tini entrypoint, however, that doesn't seem to be accurate.

lox commented

moby/moby#9098 (comment):

The root cause is when using --tty signal proxying is entirely disabled with no way to enable it (even with --sig-proxy).

🤦🏼‍♂️

lox commented

Breakage was introduced in 2013, and there is an open patch to moby at docker/cli#1841.

lox commented

I'll fix this with a pre-exit hook that calls:

docker kill --signal=SIGTERM my_container
docker rm -v my_container