Metaswitch/floki

Floki doesn't wait for dind service to be ready

dsteeley opened this issue · 2 comments

Since updating to use the latest Docker in Docker image rather than stable-dind I have noticed that running Docker commands can fail with a race condition with the adjacent DinD container.
The error I get is as follows

docker: Cannot connect to the Docker daemon at tcp://floki-docker:2375. Is the docker daemon running?.

This is hit ~10 seconds after the DinD container is started.

If I add an arbitrary 30 second delay before starting the docker build the dind container appears to be ready and the build is successful.

Could there be a check added when running with dind: true or a specified DinD image to verify that the docker service is ready?

Thanks for raising this.

I think there is some merit to adding a check, but 10 seconds sounds like a long time for the dind container to start, and I wonder if there is an underlying issue here. That would probably be my first line of investigation.

Looking at the docker startup logs they begin with:

time="2022-02-25T19:27:13.527732181Z" level=info msg="Starting up"
time="2022-02-25T19:27:13.529046953Z" level=warning msg="Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network." host="tcp://0.0.0.0:2375"
time="2022-02-25T19:27:13.529119698Z" level=warning msg="Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out there!" host="tcp://0.0.0.0:2375"
time="2022-02-25T19:27:14.529288539Z" level=warning msg="Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message" host="tcp://0.0.0.0:2375"
time="2022-02-25T19:27:14.529366540Z" level=warning msg="Please consider generating tls certificates with client validation to prevent exposing unauthenticated root access to your network" host="tcp://0.0.0.0:2375"
time="2022-02-25T19:27:14.529379147Z" level=warning msg="You can override this by explicitly specifying '--tls=false' or '--tlsverify=false'" host="tcp://0.0.0.0:2375"

Then the startup freezes for ~10 seconds before continuing.

  • Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network.
  • Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out there!
  • Binding to an IP address without --tlsverify is deprecated. Start-up is intentionally being slowed down to show this message.
  • Please consider generating TLS certificates with client validation to prevent exposing unauthenticated root access to your network.
  • You can override this by explicitly specifying --tls=false or --tlsverify=false

Presumably floki should pass --tls=false or --tlsverify=false?