Docker 1.9.1 hanging at build step "Setting up ca-certificates-java"
jredl-va opened this issue ยท 258 comments
A few of us within the office upgraded to the latest version of docker toolbox backed by Docker 1.9.1 and builds are hanging as per the below build output.
docker version
:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: darwin/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: linux/amd64
docker info
:
Containers: 10
Images: 57
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 77
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-boot2docker
Operating System: Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015
CPUs: 1
Total Memory: 1.956 GiB
Name: vbootstrap-vm
ID: LLM6:CASZ:KOD3:646A:XPRK:PIVB:VGJ5:JSDB:ZKAN:OUC4:E2AK:FFTC
Debug mode (server): true
File Descriptors: 13
Goroutines: 18
System Time: 2015-11-24T02:03:35.597772191Z
EventsListeners: 0
Init SHA1:
Init Path: /usr/local/bin/docker
Docker Root Dir: /mnt/sda1/var/lib/docker
Labels:
provider=virtualbox
uname -a
:
Darwin JRedl-MB-Pro.local 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:46 PDT 2015; root:xnu-3247.10.11~1/RELEASE_X86_64 x86_64
Here is a snippet from the docker build uppet that hangs on the Setting up ca-certificates-java line. Something to do with the latest version of docker and openjdk?
update-alternatives: using /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/tnameserv to provide /usr/bin/tnameserv (tnameserv) in auto mode
update-alternatives: using /usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jexec to provide /usr/bin/jexec (jexec) in auto mode
Setting up ca-certificates-java (20140324) ...
Docker file example:
FROM gcr.io/google_appengine/base
# Prepare the image.
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y -qq --no-install-recommends build-essential wget curl unzip python python-dev php5-mysql php5-cli php5-cgi openjdk-7-jre-headless openssh-client python-openssl && apt-get clean
I can confirm that this is not an issue with Docker 1.9.0 or Docker Toolbox 1.9.0d. Let me know if I can provide any further information but this feels like a regression of some sort within the new version.
I am facing same problem. I am investigating.
We're facing the same problem as well.
Yep, it is a problem em docker 1.9. I had downgraded to 1.8.3 and all problems solved. Now i am investigating a workarround. will post here! Tks
I'm having the same issue with docker 1.9.1a
I have docker 1.8.3, so maybe the process of installing a different version of docker remedies the situation. @bsao.
Are you only seeing this on boot2docker?
I cannot repo on a stock ubuntu with aufs or on my machine. let me try with boot2docker to see if I can repo there.
+1 in Docker 1.9.1 for ubuntu:14.10 using OSX
This is an issue that started appearing after I turned on VPN for work. Even after I turned off VPN and restarted the docker machine on OSX it continued to have this problem. I re-installed Docker 1.9.1 and then 1.8.3, still seeing the issue. Blocks me from using most if not all of my dockers on the Mac.
+1 in Docker 1.9.1 for ubuntu 12.04 using OS X 10.11
@crosbymichael I unfortunately have not tried it on any other environment than Boot2Docker.
Someone with the know-how of git-bisecting and docker could use the build IDs provided by @chico1198!
I experienced the same problem with 1.9.1 on OSX El Capitan, downgrading to 1.9.0 didn't help.
@crosbymichael I logged in boot2docker and ran ps auxf
, this is what I saw:
root 1290 0.4 1.8 1346656 75692 ? Sl Nov27 4:53 /usr/local/bin/docker daemon -D -g /var/lib/docker -H unix:// -H tcp://0.0.0.0:2376 [...]
root 8556 0.0 0.0 0 0 ? Ss 05:12 0:00 \_ [sh]
root 24221 99.8 0.0 0 0 ? Zl 05:33 64:17 | \_ [java] <defunct>
root 24657 0.0 0.0 0 0 ? Ss 06:07 0:00 \_ [sh]
root 6174 79.6 0.0 0 0 ? Zl 06:22 12:33 \_ [java] <defunct>
root 7295 49.3 0.0 0 0 ? Zl 06:32 2:49 \_ [java] <defunct>
+1 with docker 1.9.1 on OSX 10.11 with attempting to build image from ubuntu 14.04
+1
use DockerToolbox-1.9.1a.pkg
docker version 2 master?
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: darwin/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: linux/amd64
Downgrading to Docker 1.8.3 is my temporary workaround. Here's the target
I use in my Makefile
.
downgrade-docker:
docker-machine ssh $(DOCKER_MACHINE_NAME) sudo /etc/init.d/docker stop
docker-machine ssh $(DOCKER_MACHINE_NAME) "while sudo /etc/init.d/docker status ; do sleep 1; done"
docker-machine ssh $(DOCKER_MACHINE_NAME) "sudo curl 'https://get.docker.com/builds/Linux/x86_64/docker-1.8.3' -o /usr/local/bin/docker-1.8.3"
docker-machine ssh $(DOCKER_MACHINE_NAME) "sudo ln -sf /usr/local/bin/docker-1.8.3 /usr/local/bin/docker"
# FIXME: Starting machine is not enough; always fails with message like "Need TLS certs for 127.0.0.1,10.0.2.15,192.168.99.100"
#docker-machine ssh $(DOCKER_MACHINE_NAME) sudo /etc/init.d/docker start
docker-machine stop $(DOCKER_MACHINE_NAME)
docker-machine start $(DOCKER_MACHINE_NAME)
I couldn't reproduce this. Does it always hang at "setting up certificates" ? Did you try sending a ^D
to close some pipe? Can you also try sending a SIGUSR1
to the daemon and paste the stack trace here when it's stuck?
+1 with docker 1.9.1 on OS X 10.10
I tried downgrading to 1.8.3 using @osterman 's Makefile and also had troubles with the SSH key:
ip-10-100-0-211:docker-dev leaf$ docker-machine start default
(default) OUT | Starting VM...
Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
Tested it by doing different openjdk installs inside debian:jessie and ubuntu
OSX 10.11.1, boot2docker 1.9.1: hangs
OSX 10.11.1, boot2docker 1.9.0: works
Ubuntu 14.04 with docker 1.9.1: works
The boot2docker vms were created with:
docker-machine create -d virtualbox --virtualbox-boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v1.9.0/boot2docker.iso
and
docker-machine create -d virtualbox --virtualbox-boot2docker-url=https://github.com/boot2docker/boot2docker/releases/download/v1.9.1/boot2docker.iso
On Ubuntu 14.04 docker was installed following the documentation on https://docs.docker.com/engine/installation/ubuntulinux/
I can't reproduce this.
Same issue here.
Is there any way to downgrade to an earlier version on Windows?
Found it myself.
https://github.com/docker/docker/releases
+1, docker 1.9.1 @ El Capitan
+1, Docker 1.9.1 on OS X 10.11.1
+1, Docker 1.9.1a, OS X 10.10.5
+1
Same on Docker-machine on OSX 10.11.1
Docker version 1.9.1, build a34a1d5
docker-machine version 0.5.1 (HEAD)
I'm able to reproduce this on docker-machine, OS X 10.10.5, so this may be something related to boot2docker. docker top
also gives me <defunct>
for a java process;
docker top dreamy_sammet Tue Dec 1 15:58:47 2015
UID PID PPID C STIME TTY TIME CMD
root 2538 1023 0 14:44 ? 00:00:00 /bin/sh -c apt-get update && apt-get install -y -qq --no-install-recommends build-essential wget curl unzip python python-dev php5-mysql php5-cli php5-cgi openjdk-7-jre-headless openssh-client python-openssl && apt-get clean
root 2566 2538 1 14:44 ? 00:00:16 apt-get install -y -qq --no-install-recommends build-essential wget curl unzip python python-dev php5-mysql php5-cli php5-cgi openjdk-7-jre-headless openssh-client python-openssl
root 4830 2566 0 14:46 pts/0 00:00:00 /usr/bin/dpkg --status-fd 14 --configure libgdbm3:amd64 libjson-c2:amd64 libbsd0:amd64 libedit2:amd64 libkeyutils1:amd64 libkrb5support0:amd64 libk5crypto3:amd64 libkrb5-3:amd64 libgssapi-krb5-2:amd64 libidn11:amd64 libsasl2-modules-db:amd64 libsasl2-2:amd64 libldap-2.4-2:amd64 libmagic1:amd64 libsqlite3-0:amd64 libwrap0:amd64 libxml2:amd64 perl-modules:all perl:amd64 mime-support:all libexpat1:amd64 libpython2.7-stdlib:amd64 python2.7:amd64 libpython-stdlib:amd64 python:amd64 libasan1:amd64 libasyncns0:amd64 libatomic1:amd64 libavahi-common-data:amd64 libavahi-common3:amd64 libdbus-1-3:amd64 libavahi-client3:amd64 libcilkrts5:amd64 libisl10:amd64 libcloog-isl4:amd64 libcups2:amd64 librtmp1:amd64 libssh2-1:amd64 libcurl3:amd64 libogg0:amd64 libflac8:amd64 libpng12-0:amd64 libfreetype6:amd64 ucf:all fonts-dejavu-core:all fontconfig-config:all libfontconfig1:amd64 libglib2.0-0:amd64 libgomp1:amd64 x11-common:all libice6:amd64 libicu52:amd64 libitm1:amd64 liblcms2-2:amd64 liblsan0:amd64 libmpfr4:amd64 mysql-common:all libmysqlclient18:amd64 libnspr4:amd64 libnss3:amd64 libonig2:amd64 libpcsclite1:amd64 libsm6:amd64 libvorbis0a:amd64 libvorbisenc2:amd64 libsndfile1:amd64 libxau6:amd64 libxdmcp6:amd64 libxcb1:amd64 libx11-data:all libx11-6:amd64 libx11-xcb1:amd64 libxext6:amd64 libxi6:amd64 libxtst6:amd64 libpulse0:amd64 libpython2.7:amd64 libc-dev-bin:amd64 linux-libc-dev:amd64 libc6-dev:amd64 libexpat1-dev:amd64 libpython2.7-dev:amd64 libquadmath0:amd64 libsctp1:amd64 libtsan0:amd64 libubsan0:amd64 tzdata-java:all java-common:all libjpeg62-turbo:amd64 ca-certificates-java:all openjdk-7-jre-headless:amd64 libmpc3:amd64 libpsl0:amd64 wget:amd64 bzip2:amd64 libperl4-corelibs-perl:all lsof:amd64 openssh-client:amd64 patch:amd64 xz-utils:amd64 binutils:amd64 cpp-4.9:amd64 cpp:amd64 libgcc-4.9-dev:amd64 gcc-4.9:amd64 gcc:amd64 libstdc++-4.9-dev:amd64 g++-4.9:amd64 g++:amd64 make:amd64 libtimedate-perl:all libdpkg-perl:all dpkg-dev:all build-essential:amd64 curl:amd64 libpython-dev:amd64 libqdbm14:amd64 psmisc:amd64 php5-common:amd64 php5-json:amd64 php5-cli:amd64 php5-cgi:amd64 php5-mysql:amd64 python-ply:all python-pycparser:all python-cffi:amd64 python-pkg-resources:all python-six:all python-cryptography:amd64 python2.7-dev:amd64 python-dev:amd64 python-openssl:all unzip:amd64
root 6711 4830 0 14:46 pts/0 00:00:00 /bin/bash /var/lib/dpkg/info/ca-certificates-java.postinst configure
root 6725 6711 97 14:46 pts/0 00:12:25 [java] <defunct>
/cc @tianon @nathanleclaire @JeffDM perhaps any of you has an idea where to look, or what to debug, I couldn't really find something
Looks like memory is not the problem, however the <defunct>
process does consume 100% CPU;
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
d263da116bfd 99.51% 689.3 MB / 2.1 GB 32.82% 157.9 MB / 2.754 MB 25.15 MB / 130.4 MB
The container seems to be stuck as well, and I had to reboot the vm to get it killed
+1 Docker version 1.9.1, build a34a1d5, Win 7.
I've run into similar problems that turned out to be OOM, even though the stats command shows memory available to the container. The problem happened soon after task manager showed 0 free physical memory, while stats continued to show <100%.
Weird thing is, that the process kept running, so it was not killed. I can retry with a -m, however, it's strange that this happens on 1.9.x, but (following this discussion) not on 1.8. Also, running the same on a 1GB DigitalOcean droplet (also 1.9.1) succeeded. Perhaps that one uses swap, should check that
It actually kept happening to me after I uninstalled 1.9.1 and installed 1.8.3. Looked like the uninstall wasn't very thorough though on Mac because firing up the shell was without delay on 1.8.3, unlike a normal first run where it sets up ssh keys and stuff.
USER POLL
The best way to get notified when there are changes in this discussion is by clicking the Subscribe button in the top right.
The people listed below have appreciated your meaningfull discussion with a random +1:
31 participants on this issue and counting.
@thaJeztah I didn't mean to offend nor be deconstructive. I mean to draw attention to the fact that github shows the number of people participating...and I gathered that @GordonTheTurtle wanted to construct a list of people who have done +1. Maybe I was confused by what he meant. In any case, I watch this issue with great anticipation since it has affected me on more than one occasion in the past weeks. I am glad we have information from various users.
I am able to duplicate this issue on my setup (using Docker Machine on Mac).
Here are my findings so far.
As noted by other posters, the simplest way to duplicate this has been to use the boot2docker 1.9.1 ISO with AUFS. This Dockerfile
should minimally reproduce the problem fairly quickly:
FROM debian:jessie
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends openjdk-7-jre-headless
Looking at dmesg
, I see some AUFS errors after attempting such a build, but I am not 100% sure they are related:
docker@default:~$ dmesg | tail
aufs au_opts_verify:1597:docker[14186]: dirperm1 breaks the protection by the permission bits on the lower branch
aufs au_opts_verify:1597:docker[14186]: dirperm1 breaks the protection by the permission bits on the lower branch
aufs au_opts_verify:1597:docker[14186]: dirperm1 breaks the protection by the permission bits on the lower branch
device veth955cc15 entered promiscuous mode
IPv6: ADDRCONF(NETDEV_UP): veth955cc15: link is not ready
eth0: renamed from vethc63e038
IPv6: ADDRCONF(NETDEV_CHANGE): veth955cc15: link becomes ready
docker0: port 2(veth955cc15) entered forwarding state
docker0: port 2(veth955cc15) entered forwarding state
docker0: port 2(veth955cc15) entered forwarding state
If I create a Docker 1.9.1 machine which uses overlay
as the storage driver:
$ docker-machine create -d virtualbox --engine-storage-driver overlay overlay
The process does NOT hang and this line runs successfully! Looks like AUFS and/or kernel is the problem.
boot2docker/boot2docker did bump both kernel versions and AUFS commit for the 1.9.1 release, so those are both factors which need to be ruled out or investigated further:
- boot2docker/boot2docker@v1.9.0...master#diff-3254677a7917c6c01f55212f86c57fbfR23
- boot2docker/boot2docker@v1.9.0...master#diff-3254677a7917c6c01f55212f86c57fbfR32
Currently trying 1.9.0 ISO with a 1.9.1 binary to see if the surface area of potential bug area can be reduced further.
The Dockerfile will build fine and not hang on a boot2docker 1.9.0 ISO with a Docker 1.9.1 binary. The issue seems not to lie with Docker 1.9.1, but rather the environment in which it is being run.
I am using the 1.9.1 release with no issue on aufs, but have significantly more cpu/ram/storage than the default machine config.
I just tried raising the memory to 4GB for my VM, but still able to reproduce
@cpuguy83 AUFS on boot2docker 1.9.1?
As noted above, b2d bundles a very specific version of AUFS.
Yep
Containers: 13
Images: 191
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 221
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-boot2docker
Operating System: Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015
CPUs: 1
Total Memory: 3.859 GiB
Name: default
ID: XMQH:4YAW:ZDSA:OWC7:GAPC:US5P:YQ4M:SVMQ:VXNL:RRZC:YNHT:ZBHE
Debug mode (server): true
File Descriptors: 12
Goroutines: 19
System Time: 2015-12-01T23:05:28.760107918Z
EventsListeners: 0
Init SHA1:
Init Path: /usr/local/bin/docker
Docker Root Dir: /mnt/sda1/var/lib/docker
Labels:
provider=virtualbox
I also see some java processes becoming defunct in a container. I am able to reproduce this issue with the following steps
run the container:
docker run --rm -it myJavaContainerFromCentos7 bash
create Foo.java with the following:
class Foo {
public static void main (String[] a) {
System.out.println("hello world");
}
}
compile and run it results in a defunct java process, with 1 core using 100%cpu:
javac Foo.java
&& java Foo
however... if a System.exit(0);
is added after the println everything is ok:
class Foo {
public static void main (String[] a) {
System.out.println("hello world");
System.exit(0); // clean exit, no hang
}
}
version info:
osx 10.10.3
docker 1.9.1
boot2docker version 1.9.1 uname -a is "linux ci 4.1.13-boot2docker"
numproc = 1
strace output with System.exit(0);
open("/usr/java/jdk1.7.0_75/jre/lib/amd64/jvm.cfg", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=677, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f27b1dab000
read(3, "# Copyright (c) 2003, Oracle and"..., 4096) = 677
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7f27b1dab000, 4096) = 0
stat("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
futex(0x7f27b17580d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\245\36\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
mmap(NULL, 15167976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f27b031c000
mprotect(0x7f27b0e8f000, 2097152, PROT_NONE) = 0
mmap(0x7f27b108f000, 802816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb73000) = 0x7f27b108f000
mmap(0x7f27b1153000, 262632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f27b1153000
close(3) = 0
open("/usr/java/jdk1.7.0_75/bin/../lib/amd64/jli/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11922, ...}) = 0
mmap(NULL, 11922, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f27b1da9000
close(3) = 0
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260T\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1141552, ...}) = 0
mmap(NULL, 3150168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f27b001a000
mprotect(0x7f27b011b000, 2093056, PROT_NONE) = 0
mmap(0x7f27b031a000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x100000) = 0x7f27b031a000
close(3) = 0
mprotect(0x7f27b031a000, 4096, PROT_READ) = 0
munmap(0x7f27b1da9000, 11922) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f27b1ca4000
mprotect(0x7f27b1ca4000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f27b1da3fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f27b1da49d0, tls=0x7f27b1da4700, child_tidptr=0x7f27b1da49d0) = 118
futex(0x7f27b1da49d0, FUTEX_WAIT, 118, NULLhellowerld
<unfinished ...>
+++ exited with 0 +++
strace output without System.exit(0);
open("/usr/java/jdk1.7.0_75/jre/lib/amd64/jvm.cfg", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0755, st_size=677, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fac9a490000
read(3, "# Copyright (c) 2003, Oracle and"..., 4096) = 677
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7fac9a490000, 4096) = 0
stat("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
futex(0x7fac99e3d0d0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
open("/usr/java/jdk1.7.0_75/jre/lib/amd64/server/libjvm.so", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240\245\36\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=15224066, ...}) = 0
mmap(NULL, 15167976, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac98a01000
mprotect(0x7fac99574000, 2097152, PROT_NONE) = 0
mmap(0x7fac99774000, 802816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xb73000) = 0x7fac99774000
mmap(0x7fac99838000, 262632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac99838000
close(3) = 0
open("/usr/java/jdk1.7.0_75/bin/../lib/amd64/jli/libm.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11922, ...}) = 0
mmap(NULL, 11922, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fac9a48e000
close(3) = 0
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260T\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1141552, ...}) = 0
mmap(NULL, 3150168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac986ff000
mprotect(0x7fac98800000, 2093056, PROT_NONE) = 0
mmap(0x7fac989ff000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x100000) = 0x7fac989ff000
close(3) = 0
mprotect(0x7fac989ff000, 4096, PROT_READ) = 0
munmap(0x7fac9a48e000, 11922) = 0
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fac9a389000
mprotect(0x7fac9a389000, 4096, PROT_NONE) = 0
clone(child_stack=0x7fac9a488fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fac9a4899d0, tls=0x7fac9a489700, child_tidptr=0x7fac9a4899d0) = 142
futex(0x7fac9a4899d0, FUTEX_WAIT, 142, NULLhellowerld
) = 0
exit_group(0) = ?
the process is now hung but you can enter the container:
docker exec -it myContainer bash
and see the following:
ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 23:47 ? 00:00:00 bash
root 138 1 0 23:51 ? 00:00:00 strace java Foo
root 141 138 24 23:51 ? 00:01:21 [java] <defunct>
root 151 0 1 23:57 ? 00:00:00 bash
root 167 151 0 23:57 ? 00:00:00 ps -ef
quick look at stats:
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
myContainer 24.72% 64.18 MB / 8.365 GB 0.77% 11.09 MB / 202.6 kB 8.192 kB / 14.99
Everything works fine in 1.8.3.
+1, Docker version 1.9.1, build a34a1d5, OS X 10.10.5, Docker Machine Version: 0.5.1 (HEAD)
+1
Docker version 1.9.1, build a34a1d5
, OS X 10.11.1 (15B42)
This issue really is quite bizarre. If I strace
the failing apt-get
command, the end of the output is:
stat("/etc/apt/sources.list", {st_mode=S_IFREG|0644, st_size=161, ...}) = 0
open("/etc/apt/sources.list", O_RDONLY) = 5
read(5, "deb http://httpredir.debian.org/"..., 8191) = 161
pipe([6, 7]) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fc6fc88aa10) = 14
close(7) = 0
fcntl(6, F_GETFL) = 0 (flags O_RDONLY)
fstat(6, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fc6fc892000
lseek(6, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
read(6, Process 14 attached
<unfinished ...>
[pid 14] rt_sigaction(SIGPIPE, {SIG_DFL, [PIPE], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_IGN, [PIPE], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, 8) = 0
[pid 14] rt_sigaction(SIGQUIT, {SIG_DFL, [QUIT], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid 14] rt_sigaction(SIGINT, {SIG_DFL, [INT], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid 14] rt_sigaction(SIGWINCH, {SIG_DFL, [WINCH], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {0x7fc6fc0e5750, [WINCH], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, 8) = 0
[pid 14] rt_sigaction(SIGCONT, {SIG_DFL, [CONT], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid 14] rt_sigaction(SIGTSTP, {SIG_DFL, [TSTP], SA_RESTORER|SA_RESTART, 0x7fc6fb531180}, {SIG_DFL, [], 0}, 8) = 0
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(3, F_SETFD, FD_CLOEXEC) = 0
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(4, F_SETFD, FD_CLOEXEC) = 0
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(6, F_SETFD, FD_CLOEXEC) = 0
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(7, F_SETFD, FD_CLOEXEC) = 0
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(8, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(9, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(10, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
[pid 14] getrlimit(RLIMIT_NOFILE, {rlim_cur=1024*1024, rlim_max=1024*1024}) = 0
[pid 14] fcntl(11, F_SETFD, FD_CLOEXEC) = -1 EBADF (Bad file descriptor)
Where those (Bad file descriptor) errors continue to loop indefinitely.
RLIMIT_NOFILE
Specifies a value one greater than the maximum file descriptor
number that can be opened by this process. Attempts (open(2),
pipe(2), dup(2), etc.) to exceed this limit yield the error
EMFILE. (Historically, this limit was named RLIMIT_OFILE on
BSD.)
SIGPIPE is failing? this might correspond to my previous post where I saw java "hello world" causing zombie processes without an explicit "System.exit(0);" -- or maybe thats a completely different issue. if so sorry for the noise.
what happens to your cpu while looping indefinitely?
@andrewgdavis It's at 100%
java "hello world" causing zombie processes without an explicit "System.exit(0);"
That certainly sounds similar to the problem encountered here.
I can definitely confirm the b2d issue (even did the bisect to track it most positively to the 4.1.13 kernel bump). I can also reproduce on 4.2.6 with b2d.
As an additional kink, my Gentoo host is currently on 4.1.13 + AUFS patches also, and I'm seeing the same exact problem, so we've definitely ruled out anything b2d-specific.
I think it might be worth trawling through commits between 4.1.12 and 4.1.13 to see if anything that might be related jumps out.
(ie, https://www.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.1.13)
Yup, something breaks from kernel 4.1.12 => 4.1.13. I can confirm that baking a boot2docker ISO for the former doesn't trip this bug but the former does.
So, it's not specifically related to boot2docker, but seems to be related to the kernel version interacting with AUFS.
i've got a hair brained theory...
the commit above makes a change to filemap.c to generic_perform_write(struct file *file, struct iov_iter *i, loff_t pos)
below is the chunk of code i personally want to test because the comment describes both deadlock and livelock race conditions and i see the cpu pegged at 100%. but thats just me and my jump-to-conclusions mat.
4.1.13 mm/filemap.c#l_2448
...
2448 again:
2449 /*
2450 * Bring in the user page that we will copy from _first_.
2451 * Otherwise there's a nasty deadlock on copying from the
2452 * same page as we're writing to, without it being marked
2453 * up-to-date.
2454 *
2455 * Not only is this an optimisation, but it is also required
2456 * to check that the address is actually valid, when atomic
2457 * usercopies are used, below.
2458 */
2459 if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
2460 status = -EFAULT;
2461 break;
2462 }
2463
2464 if (fatal_signal_pending(current)) {
2465 status = -EINTR;
2466 break;
2467 }
2468
2469 status = a_ops->write_begin(file, mapping, pos, bytes, flags,
2470 &page, &fsdata);
2471 if (unlikely(status < 0))
2472 break;
2473
2474 if (mapping_writably_mapped(mapping))
2475 flush_dcache_page(page);
2476
2477 copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
2478 flush_dcache_page(page);
2479
2480 status = a_ops->write_end(file, mapping, pos, bytes, copied,
2481 page, fsdata);
2482 if (unlikely(status < 0))
2483 break;
2484 copied = status;
2485
2486 cond_resched();
2487
2488 iov_iter_advance(i, copied);
2489 if (unlikely(copied == 0)) {
2490 /*
2491 * If we were unable to copy any data at all, we must
2492 * fall back to a single segment length write.
2493 *
2494 * If we didn't fallback here, we could livelock
2495 * because not all segments in the iov can be copied at
2496 * once without a pagefault.
2497 */
2498 bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
2499 iov_iter_single_seg_count(i));
2500 goto again;
2501 }
2502 pos += copied;
2503 written += copied;
2504
2505 balance_dirty_pages_ratelimited(mapping);
2506 } while (iov_iter_count(i));
@andrewgdavis one could use that commit during git bisect as a specific testing point!
Seeing a similar hang when shutting down mongodb
. Definitely present in 1.9.x. Not present in 1.8.x.
I've been able to solve this issue for myself by increasing the docker-machine VM's memory from 1024 to 2048 MB and assigning 2 CPUs instead of 1.
Works:
VM: Ubuntu 14.04 (2gb ram)
Docker Engine: 1.9.1
Docker base image: ubuntu:latest
Does not work:
VM: Ubuntu 15.10 (2 gb ram)
Docker Engine: 1.9.1,1.9.0,1.8.3
Docker base image: ubuntu:latest, ubuntu:14.04
@marsinvasion If possible, can you print the output of uname -a
on both tested systems?
+1
Docker version 1.9.1, build a34a1d5 on OS X 10.11.1
Encountered on OS X 10.9.5 with docker 1.9.1.
Inspired by @marsinvasion, I got a successful workaround by giving my docker-machine 2 CPUs and 4096Mb RAM.
Oops, spoke to soon. It stopped working upon changing a Dockerfile I'm working on and re-running build.
Also seeing this hellacious bug (docker-machine boot2docker 1.9.1 on OS X), from a previously building ubuntu:15.04 image. It seems to require restarting my docker server to get those zombie containers to go away.
I thought docker-library/openjdk#19 was related but maybe not, here we're getting a hang, there they got an error about not finding "java".
Switched my server to overlay as a workaround. Before that it created a bunch of zombie containers as well.
Docker version 1.9.1, build a34a1d5 on OS X 10.11.1
Anyone know what's involved in migrating an existing boot2docker.iso system to https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/ or is it easier to do a full rebuild? That page has ominous warnings about CentOS image builds -- what are the "yum" workarounds, is it related to #10180?
It's fixed in 1.9.1a - install this if you're on OSX - https://github.com/docker/toolbox/releases/download/v1.9.1a/DockerToolbox-1.9.1a.pkg
Definitely not fixed by Docker Toolbox 1.9.1a. Suffering from this bug with that version. Looking back through the comments, it looks like I'm not the only one.
nope still not building
I had to delete the VM in virtualbox and start from scratch for it to work.
Also, tried deleting and creating a new VM several times to no avail.
Installed 1.9.1a, did docker-machine rm default
and used Docker Quickstart Terminal to regenerate default machine. Rebuilt images (that derive from java:7-jre
) and ran, still does not work. Continues to work just fine with overlay machine built as suggested above:
$ docker-machine create -d virtualbox --engine-storage-driver overlay overlay
^thanks! I can confirm the overlay machine is working.
Using overlay
as the engine storage driver also worked for fixing the MongoDB shutdown hang.
You can workaround the Dockerfile build failure by installing Oracle java instead of OpenJDK:
# Oracle java is bulkier but avoids boot2docker/aufs docker issue 18180
RUN apt-get install -y software-properties-common python-software-properties && add-apt-repository -y ppa:webupd8team/java && apt-get update
RUN echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections
RUN apt-get install -y oracle-java8-installer && apt-get install -y oracle-java8-set-default
But I was underestimating the scope of the problem, boot2docker 1.9.1 leads to zombie java processes, even on CentOS containers where openjdk installs fine.
root 322 11.1 0.0 0 0 ? Zsl 18:43 29:48 [java] <defunct>
I'm unable to configure my docker server with --engine-storage-driver overlay
because I build CentOS-based images, and overlayfs
is not compatible with yum
(#10180).
I'm sure Docker folks would not recommend this, but the way I moved past this blocking issue is by building a boot2docker.iso that uses docker 1.9.1 with a slightly older AUFS. Instructions in boot2docker/boot2docker#1099 (comment).
tried oracle jdk1.7.0_75 and jdk1.8.0_65; both hang and create a defunct java process.
FROM : #10589
@neverfox exactly the same problem here, with the same image +1
~ docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.5.1
Git commit: a34a1d5
Built: Sat Nov 21 00:49:19 UTC 2015
OS/Arch: darwin/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: linux/amd64
~ docker-machine inspect default
{
"ConfigVersion": 3,
"Driver": {
"Driver": {
"VBoxManager": {},
"IPAddress": "192.168.99.100",
"MachineName": "default",
"SSHUser": "docker",
"SSHPort": 61012,
"SSHKeyPath": "/Users/myuser/.docker/machine/machines/default/id_rsa",
"StorePath": "/Users/myuser/.docker/machine",
"SwarmMaster": false,
"SwarmHost": "tcp://0.0.0.0:3376",
"SwarmDiscovery": "",
"CPU": 1,
"Memory": 4096,
"DiskSize": 20000,
"Boot2DockerURL": "",
"Boot2DockerImportVM": "",
"HostOnlyCIDR": "192.168.99.1/24",
"HostOnlyNicType": "82540EM",
"HostOnlyPromiscMode": "deny",
"NoShare": false
},
"Locker": {}
},
"DriverName": "virtualbox",
"HostOptions": {
"Driver": "",
"Memory": 0,
"Disk": 0,
"EngineOptions": {
"ArbitraryFlags": [],
"Dns": null,
"GraphDir": "",
"Env": [],
"Ipv6": false,
"InsecureRegistry": [],
"Labels": [],
"LogLevel": "",
"StorageDriver": "",
"SelinuxEnabled": false,
"TlsVerify": true,
"RegistryMirror": [],
"InstallURL": "https://get.docker.com"
},
"SwarmOptions": {
"IsSwarm": false,
"Address": "",
"Discovery": "",
"Master": false,
"Host": "tcp://0.0.0.0:3376",
"Image": "swarm:latest",
"Strategy": "spread",
"Heartbeat": 0,
"Overcommit": 0,
"ArbitraryFlags": [],
"Env": null
},
"AuthOptions": {
"CertDir": "/Users/myuser/.docker/machine/certs",
"CaCertPath": "/Users/myuser/.docker/machine/certs/ca.pem",
"CaPrivateKeyPath": "/Users/myuser/.docker/machine/certs/ca-key.pem",
"CaCertRemotePath": "",
"ServerCertPath": "/Users/myuser/.docker/machine/machines/default/server.pem",
"ServerKeyPath": "/Users/myuser/.docker/machine/machines/default/server-key.pem",
"ClientKeyPath": "/Users/myuser/.docker/machine/certs/key.pem",
"ServerCertRemotePath": "",
"ServerKeyRemotePath": "",
"ClientCertPath": "/Users/myuser/.docker/machine/certs/cert.pem",
"StorePath": "/Users/myuser/.docker/machine/machines/default"
}
},
"Name": "default",
"RawDriver": "eyJWQm94TWFuYWdlciI6e30sIklQQWRkcmVzcyI6IjE5Mi4xNjguOTkuMTAwIiwiTWFjaGluZU5hbWUiOiJkZWZhdWx0IiwiU1NIVXNlciI6ImRvY2tlciIsIlNTSFBvcnQiOjYxMDEyLCJTU0hLZXlQYXRoIjoiL1VzZXJzL2RhdmlkZnJhbmNvZXVyLy5kb2NrZXIvbWFjaGluZS9tYWNoaW5lcy9kZWZhdWx0L2lkX3JzYSIsIlN0b3JlUGF0aCI6Ii9Vc2Vycy9kYXZpZGZyYW5jb2V1ci8uZG9ja2VyL21hY2hpbmUiLCJTd2FybU1hc3RlciI6ZmFsc2UsIlN3YXJtSG9zdCI6InRjcDovLzAuMC4wLjA6MzM3NiIsIlN3YXJtRGlzY292ZXJ5IjoiIiwiQ1BVIjoxLCJNZW1vcnkiOjQwOTYsIkRpc2tTaXplIjoyMDAwMCwiQm9vdDJEb2NrZXJVUkwiOiIiLCJCb290MkRvY2tlckltcG9ydFZNIjoiIiwiSG9zdE9ubHlDSURSIjoiMTkyLjE2OC45OS4xLzI0IiwiSG9zdE9ubHlOaWNUeXBlIjoiODI1NDBFTSIsIkhvc3RPbmx5UHJvbWlzY01vZGUiOiJkZW55IiwiTm9TaGFyZSI6ZmFsc2V9"
}
โ ~ docker inspect 74
[
{
"Id": "7471b734d7e7e47270511453a04d903c974cba77a2a0d259255355a653f95e04",
"Created": "2015-11-27T13:23:11.515987776Z",
"Path": "/docker-entrypoint.sh",
"Args": [
"cassandra",
"-f"
],
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 1263,
"ExitCode": 0,
"Error": "",
"StartedAt": "2015-11-27T13:23:11.612899257Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Image": "338a92b912e4d5a84c4f399a9475a1476f8226eff85c2592c8e80ba58b13d225",
"ResolvConfPath": "/mnt/sda1/var/lib/docker/containers/7471b734d7e7e47270511453a04d903c974cba77a2a0d259255355a653f95e04/resolv.conf",
"HostnamePath": "/mnt/sda1/var/lib/docker/containers/7471b734d7e7e47270511453a04d903c974cba77a2a0d259255355a653f95e04/hostname",
"HostsPath": "/mnt/sda1/var/lib/docker/containers/7471b734d7e7e47270511453a04d903c974cba77a2a0d259255355a653f95e04/hosts",
"LogPath": "/mnt/sda1/var/lib/docker/containers/7471b734d7e7e47270511453a04d903c974cba77a2a0d259255355a653f95e04/7471b734d7e7e47270511453a04d903c974cba77a2a0d259255355a653f95e04-json.log",
"Name": "/pensive_kalam",
"RestartCount": 0,
"Driver": "aufs",
"ExecDriver": "native-0.2",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "",
"ExecIDs": null,
"HostConfig": {
"Binds": null,
"ContainerIDFile": "",
"LxcConf": [],
"Memory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"KernelMemory": 0,
"CpuShares": 0,
"CpuPeriod": 0,
"CpusetCpus": "",
"CpusetMems": "",
"CpuQuota": 0,
"BlkioWeight": 0,
"OomKillDisable": false,
"MemorySwappiness": -1,
"Privileged": false,
"PortBindings": {},
"Links": null,
"PublishAllPorts": false,
"Dns": [],
"DnsOptions": [],
"DnsSearch": [],
"ExtraHosts": null,
"VolumesFrom": null,
"Devices": [],
"NetworkMode": "default",
"IpcMode": "",
"PidMode": "",
"UTSMode": "",
"CapAdd": null,
"CapDrop": null,
"GroupAdd": null,
"RestartPolicy": {
"Name": "no",
"MaximumRetryCount": 0
},
"SecurityOpt": null,
"ReadonlyRootfs": false,
"Ulimits": null,
"LogConfig": {
"Type": "json-file",
"Config": {}
},
"CgroupParent": "",
"ConsoleSize": [
0,
0
],
"VolumeDriver": ""
},
"GraphDriver": {
"Name": "aufs",
"Data": null
},
"Mounts": [
{
"Name": "2249b03f9a598e5ac3f306983877292baa299c4499c9db77eb9bfcb88fd2f541",
"Source": "/mnt/sda1/var/lib/docker/volumes/2249b03f9a598e5ac3f306983877292baa299c4499c9db77eb9bfcb88fd2f541/_data",
"Destination": "/var/lib/cassandra",
"Driver": "local",
"Mode": "",
"RW": true
}
],
"Config": {
"Hostname": "7471b734d7e7",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": true,
"AttachStderr": true,
"ExposedPorts": {
"7000/tcp": {},
"7001/tcp": {},
"7199/tcp": {},
"9042/tcp": {},
"9160/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"CASSANDRA_VERSION=2.1.11",
"CASSANDRA_CONFIG=/etc/cassandra"
],
"Cmd": [
"cassandra",
"-f"
],
"Image": "cassandra:2.1.11",
"Volumes": {
"/var/lib/cassandra": {}
},
"WorkingDir": "",
"Entrypoint": [
"/docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {},
"StopSignal": "SIGTERM"
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "e2f074e4b10e67cd7ac22d6e73d50304fc3f0a68d67c7fee6d7f8d647c9eb9b1",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {
"7000/tcp": null,
"7001/tcp": null,
"7199/tcp": null,
"9042/tcp": null,
"9160/tcp": null
},
"SandboxKey": "/var/run/docker/netns/e2f074e4b10e",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "63596aa5ec20516d477921fec4197d086b4dd4f1ad25014b5ddf027b82891966",
"Gateway": "172.17.0.1",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"MacAddress": "02:42:ac:11:00:02",
"Networks": {
"bridge": {
"EndpointID": "63596aa5ec20516d477921fec4197d086b4dd4f1ad25014b5ddf027b82891966",
"Gateway": "172.17.0.1",
"IPAddress": "172.17.0.2",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:ac:11:00:02"
}
}
}
}
]
I simply ran docker run -it cassandra:2.1.11
and your terminal will be stuck, no way to stop the container. You have to stop the whole VM.
+1
Was able to duplicate issue earlier today on Docker 1.9.1 running Mac OSX 10.11.1 (15B42)
Was able to get around it by installing Docker 1.9.0
_Apologies for lack of information was on my work machine earlier during the day - will provide updated information at later time_
๐
Same here with Docker 1.9.1 and OS X 10.11.
For people having this issue
We've so far narrowed this down to not being a docker bug but a kernel issue in combination with AUFS in the kernel that is used by the current boot2docker version; see #18180 (comment)
- If you want to stay informed on progress, use the subscribe button on this page. do not comment if you don't have new information that may help to resolve this issue.
- if you want to help resolving this, performing a git-bissect of the kernel may help #18180 (comment)
- remember that each comment will send out more than 2000 e-mails to subscribers, and countless puppies will die ๐
Just tested Storage Driver: devicemapper
(with Server Version: 1.9.1
and kernel 4.2.6), and the bug does not reproduce, so we're still in "strange interaction between some change in the newer kernel and the AUFS patches" land. ๐
Tested, and bug is still present on the fresh 4.1.14 kernel, so we're still sitting on some commit that was backported to 4.1.13 interacting weird with the AUFS patches (and didn't get lucky with it being already fixed in the interim).
I decided to give it the old college try and cloned the boot2docker repo; then modified the aufs commit in the dockerfile to the previous version. So docker 1.9.1 kernel 4.1.13 + previous AUFS version that was shipped before 1.9.1. Compilation is slow on my machine ... is there a docker swarm setup that I can run in conjunction with a git bisect and aggregate the results? that would be sweet.
any way, I will post my results shortly if it works...
update:
4.1.13 + this AUFS commit still exhibit the problem.
ENV AUFS_COMMIT 1724fe65683d126a92c6baeea0b3c7d0306c63ef
I'm not aware of any easy setup to aggregate the results, although one could conceivably be built.
FWIW, https://sources.debian.net/src/ca-certificates-java/jessie/debian/postinst.in/ is the exact script that's running in that package, and https://sources.debian.net/src/ca-certificates-java/jessie/src/main/java/org/debian/security/UpdateCertificates.java/ is the exact Java source that's being executed when we get the hang + defunct + pegged CPU.
Got into related issue (java process hangs) today.
Host environment: Linux lenovo 4.2.0-19-generic #23-Ubuntu SMP Wed Nov 11 11:39:30 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Distro: Ubuntu 15.10
Docker Engine: 1.9.1
Docker Machine: 0.5.0 (04cfa58)
I am following the network multi-host tutorial. Only difference is that I am playing with the oracle/nosql image. That image is based on Oracle Linux and uses OpenJDK.
@brunoborges yes, that could be the same issue, see #18500 (comment)
@brunoborges just check your boot2docker.iso version โ if it 1.9.1
you could try downgrade to 1.9.0 and recreate your machine and pull images once again.
If you go this way, could you write a short report here?
So i got to wondering why this only happens on java, and not any other languages. In one of my previous posts I was able to detail the most basic of reproductions by simply compiling and running
class Foo {
public static void main(String[] a) {
System.out.println("hellowerld");
}
}
for the failure case which resulted in defunct java process
and then
class Foo {
public static void main(String[] a) {
System.out.println("hellowerld");
System.exit(0);
}
}
for the expected (non defunct) case.
I then tried to reproduce something similar using python. I was unsuccessful-- but I tried. For those interested I was trying to exhibit the last strace output exit_group(0) = ?
that was seen from the zombie java process. (This link provided me with a lot of info about python threading / seccomp / etc http://stackoverflow.com/questions/25678139/how-do-you-cleanly-exit-after-enabling-seccomp-in-python )
So off to kernel land: After rebuilding the boot2docker iso, messing with the aufs verions and kernel versions (nothing of which really made a difference) I got fed up with how slow the compilation process was using numproc=1. So I changed it to 6. ==> note no longer 1 cpu (who only has 1 cpu now a days?). Suddenly the failure case
class Foo {
public static void main(String[] a) {
System.out.println("hellowerld");
}
}
started working.
Obviously, the next thing to try was to bump it back down to 1 cpu. ==> FAIL. back to a defunct java process.
So then I wanted to explore more about how java shuts down. It's not well defined. but with only 1 cpu this java process was able to be run successfully: (please don't make fun of my horrible java.)
import java.util.Iterator;
import java.util.Set;
class Foo {
static public final Object a = new Object();
static {
final Object aa = a;
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
System.out.println("added one");
if (aa == null)
{ System.out.println("out"); }
}
});
System.out.println("exit");
Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
Thread[] threadArray = threadSet.toArray(new Thread[threadSet.size()]);
for(Thread xxx : threadArray)
{
System.out.println(xxx.toString());
}
//// System.exit(0);
}
static public void main(String[] a) {}
Can anyone else please confirm this behavior? << question is now moot
Update: Even with more than one core, a defunct java process can occur. (I was running cassandra-cli and it happened.)
docker-machine ssh myVM
ps -ef:
docker 6606 5863 0 Dec11 ? 00:00:00 /bin/sh /cassandra/bin/cassandra-cli -f /home/foo/my.cli -h 172.17.0.2
docker 6651 6606 99 Dec11 ? 00:41:29 [java] <defunct>
cat /proc/6606/stack
[<ffffffff8106e491>] do_wait+0x1ab/0x23f
[<ffffffff8106e5bc>] SYSC_wait4+0x97/0xb0
[<ffffffff8106d66b>] child_wait_callback+0x0/0x43
[<ffffffff8155466e>] system_call_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff
cat /proc/6651/stack
[<ffffffff8106f06c>] do_exit+0x88f/0x8cc
[<ffffffff81075f8d>] signal_wake_up_state+0x23/0x36
[<ffffffff8106f104>] do_group_exit+0x36/0xa6
[<ffffffff8106f180>] __wake_up_parent+0x0/0x1d
[<ffffffff8155466e>] system_call_fastpath+0x12/0x71
[<ffffffffffffffff>] 0xffffffffffffffff