nanopack/nanoinit

Zombie process doesn't seem to be reaped

glinton opened this issue · 1 comments

While ending nanobox deploy dry-run in order to deploy again, nanobox hangs on Stopping docker container : (3 times in a row). After investigation, it seems to be related to a failure by nanoinit to clean up a zombie process. It may also be related to other issues where an update to docker daemon..

docker@nanobox:~$ timeout 1m docker --debug stop a8
docker@nanobox:~$ echo $?
124
docker@nanobox:~$ docker --debug exec -it a8 bash
rpc error: code = 2 desc = oci runtime error: exec failed: exit status 1
DEBU[0000] [hijack] End of stdout                       
DEBU[0000] Error resize: Error response from daemon: rpc error: code = 2 desc = containerd: process not found for container 
docker@nanobox:~$ docker top a8
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
docker              3532                3258                0                   23:00               ?                   00:00:00            [logvac] <defunct>
docker@nanobox:~$ ps aux | grep 3258
root      3258  0.0  0.0      0     0 ?        Ss   23:00   0:00 [nanoinit]
docker@nanobox:~$ docker info
Containers: 67
 Running: 2
 Paused: 0
 Stopped: 65
Images: 27
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 197
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: overlay null bridge host
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.24-boot2docker
Operating System: Boot2Docker 1.12.1 (TCL 7.2); master : 4b170dc - Fri Oct  7 22:28:40 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 3
Total Memory: 2.937 GiB
Name: nanobox
ID: TBAR:YLJE:IPRU:7WA4:T35Q:HMUO:E5E2:I52I:XE7U:CGIH:TZWG:G7KJ
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 32
 Goroutines: 46
 System Time: 2017-06-09T23:13:39.771057846Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox
Insecure Registries:
 127.0.0.0/8

It's hard to say, but looking at the code, it looks like if it gets to this: https://github.com/nanopack/nanoinit/blob/master/nanoinit.c#L243-L247 section, it never does a wait on those children. It could stuck there.