Daemon restart not starting containers with shared network in the correct order (--network=container:...)
stevenboyd opened this issue · 6 comments
If you have two containers both with a restart policy of always and they share a network stack, the docker daemon may not start them in the correct order.
For example, if you have two containers started like so
docker run -d --name a --restart always some_image
docker run -d --name b --restart always --net=container:a some_other_image
Then you restart the docker daemon, it may try and restart container b before container a, causing container b to fail to start up and this message to show up in the daemon logs
[debug] daemon.go:384 Failed to start container id_of_b: cannot join network of a non running container: a
I would expect docker to boot up container a before container b, much like it boots up linked containers in the correct order.
This is similar to #10462
docker version
Client version: 1.3.2
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): 39fa2fa/1.3.2
OS/Arch (client): linux/amd64
Server version: 1.3.2
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): 39fa2fa/1.3.2
docker info
Containers: 3
Images: 233
Storage Driver: devicemapper
Pool Name: docker-253:2-131678-pool
Pool Blocksize: 65.54 kB
Data file: /var/lib/docker/devicemapper/devicemapper/data
Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 11.31 GB
Data Space Total: 107.4 GB
Metadata Space Used: 17.21 MB
Metadata Space Total: 2.147 GB
Library Version: 1.02.89-RHEL6 (2014-09-01)
Execution Driver: native-0.2
Kernel Version: 2.6.32-504.3.3.el6.x86_64
Operating System: <unknown>
Debug mode (server): true
Debug mode (client): false
Fds: 16
Goroutines: 13
EventsListeners: 0
Init SHA1: da6572a9f895dbbc52c7a03d3d391e45e56dc021
Init Path: /usr/libexec/docker/dockerinit
Username: appfoliodocker
Registry: [https://index.docker.io/v1/]
Environment details (AWS, VirtualBox, physical, etc.):
CentOS 6.6 box
How reproducible:
It seems that a given configuration of containers will reliably fail. However it seems adding or removing other containers can change the load order and may produce different results.
Here are the steps to reproduce with boot2docker
boot2docker ssh
sudo su root
docker run -d --name mongo --restart always mongo
docker run -d --name redis --restart always --net container:mongo redis
/etc/init.d/docker stop
/etc/init.d/docker start
docker ps
docker ps will show only mongo running. The logs will show something like this:
time="2015-03-26T00:22:10Z" level="debug" msg="Loaded container 8b9b8bdac273d40a7825f2f41e27fdcdd3683c8da78df2429987581e249e5540"
time="2015-03-26T00:22:10Z" level="debug" msg="Loaded container bba7b4599c933a63214240f9069c14953e7c28292a946426a1d62f5c13b6d438"
time="2015-03-26T00:22:10Z" level="debug" msg="Restarting containers..."
time="2015-03-26T00:22:10Z" level="debug" msg="Starting container bba7b4599c933a63214240f9069c14953e7c28292a946426a1d62f5c13b6d438"
time="2015-03-26T00:22:10Z" level="debug" msg="Failed to start container bba7b4599c933a63214240f9069c14953e7c28292a946426a1d62f5c13b6d438: cannot join network of a non running container: mongo"
time="2015-03-26T00:22:10Z" level="debug" msg="Starting container 8b9b8bdac273d40a7825f2f41e27fdcdd3683c8da78df2429987581e249e5540"
time="2015-03-26T00:22:10Z" level="info" msg="+job allocate_interface(8b9b8bdac273d40a7825f2f41e27fdcdd3683c8da78df2429987581e249e5540)"
time="2015-03-26T00:22:10Z" level="info" msg="-job allocate_interface(8b9b8bdac273d40a7825f2f41e27fdcdd3683c8da78df2429987581e249e5540) = OK (0)"
time="2015-03-26T00:22:10Z" level="info" msg="+job log(start, 8b9b8bdac273d40a7825f2f41e27fdcdd3683c8da78df2429987581e249e5540, mongo:latest)"
time="2015-03-26T00:22:10Z" level="info" msg="-job log(start, 8b9b8bdac273d40a7825f2f41e27fdcdd3683c8da78df2429987581e249e5540, mongo:latest) = OK (0)"
time="2015-03-26T00:22:11Z" level="info" msg="docker daemon: 1.5.0 a8a31ef; execdriver: native-0.2; graphdriver: aufs"
time="2015-03-26T00:22:11Z" level="info" msg="+job acceptconnections()"
time="2015-03-26T00:22:11Z" level="info" msg="-job acceptconnections() = OK (0)"
+kind/feature
#dibs
Not sure about current implementation, but what you need in Docker to correctly startup containers is to build a DAG. See how I do it here in docker-fw.