Cancelled jobs leave containers running
lox opened this issue · 3 comments
We've had reports that long running scripts run in the docker plugin leave the container running when the job is cancelled.
I verified this with this gist: https://gist.github.com/lox/045a4b56a0c1e1c815fd011657c34b46/708574c0b0ff9ec8748d2cd736de31951da555cb
For some context, the agent initiates cancellation and sends a SIGTERM
to the process group of buildkite-agent bootstrap
that is executing the job.
It appears that docker run
doesn't like being SIGTERM
'd and terminates without stopping the container. My previous understanding was that docker run
would proxy signals through to the container, and whilst there were some caveats around how pid 1
operates, it should be ok with the --init
flag or a tini
entrypoint, however, that doesn't seem to be accurate.
The root cause is when using --tty signal proxying is entirely disabled with no way to enable it (even with --sig-proxy).
🤦🏼♂️
Breakage was introduced in 2013, and there is an open patch to moby
at docker/cli#1841.
I'll fix this with a pre-exit
hook that calls:
docker kill --signal=SIGTERM my_container
docker rm -v my_container