Zombie process doesn't seem to be reaped
glinton opened this issue · 1 comments
glinton commented
While ending nanobox deploy dry-run
in order to deploy again, nanobox hangs on Stopping docker container :
(3 times in a row). After investigation, it seems to be related to a failure by nanoinit to clean up a zombie process. It may also be related to other issues where an update to docker daemon..
docker@nanobox:~$ timeout 1m docker --debug stop a8
docker@nanobox:~$ echo $?
124
docker@nanobox:~$ docker --debug exec -it a8 bash
rpc error: code = 2 desc = oci runtime error: exec failed: exit status 1
DEBU[0000] [hijack] End of stdout
DEBU[0000] Error resize: Error response from daemon: rpc error: code = 2 desc = containerd: process not found for container
docker@nanobox:~$ docker top a8
UID PID PPID C STIME TTY TIME CMD
docker 3532 3258 0 23:00 ? 00:00:00 [logvac] <defunct>
docker@nanobox:~$ ps aux | grep 3258
root 3258 0.0 0.0 0 0 ? Ss 23:00 0:00 [nanoinit]
docker@nanobox:~$ docker info
Containers: 67
Running: 2
Paused: 0
Stopped: 65
Images: 27
Server Version: 1.12.1
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 197
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: overlay null bridge host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.24-boot2docker
Operating System: Boot2Docker 1.12.1 (TCL 7.2); master : 4b170dc - Fri Oct 7 22:28:40 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 3
Total Memory: 2.937 GiB
Name: nanobox
ID: TBAR:YLJE:IPRU:7WA4:T35Q:HMUO:E5E2:I52I:XE7U:CGIH:TZWG:G7KJ
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 32
Goroutines: 46
System Time: 2017-06-09T23:13:39.771057846Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
provider=virtualbox
Insecure Registries:
127.0.0.0/8
notxarb commented
It's hard to say, but looking at the code, it looks like if it gets to this: https://github.com/nanopack/nanoinit/blob/master/nanoinit.c#L243-L247 section, it never does a wait on those children. It could stuck there.