No network inside containers launched inside dind on CentOS 7 host
ag-TJNII opened this issue ยท 12 comments
We're seeing issues where dind images after docker:24.0.6-dind
have issues with networking inside the inner containers. The following logs are using the docker@sha256:8f9c4d8cdaa2f87b5269d4d6759711c843c37e34a02b8bb45653e5b8f4e2f0a2
image, which I believe should have the updates from #463 (please let me know if ti doesn't).
I can reproduce our issues by launching dind with docker run --rm -ti --privileged --name docker -e DOCKER_TLS_CERTDIR= -p 2375:2375 docker@sha256:8f9c4d8cdaa2f87b5269d4d6759711c843c37e34a02b8bb45653e5b8f4e2f0a2
INFO[2023-12-15T21:40:41.852525583Z] Starting up
WARN[2023-12-15T21:40:41.853150377Z] Binding to IP address without --tlsverify is insecure and gives root access on this machine to everyone who has access to your network. host="tcp://0.0.0.0:2375"
WARN[2023-12-15T21:40:41.853168377Z] Binding to an IP address, even on localhost, can also give access to scripts run in a browser. Be safe out there! host="tcp://0.0.0.0:2375"
WARN[2023-12-15T21:40:42.853328719Z] Binding to an IP address without --tlsverify is deprecated. Startup is intentionally being slowed down to show this message host="tcp://0.0.0.0:2375"
WARN[2023-12-15T21:40:42.853370720Z] Please consider generating tls certificates with client validation to prevent exposing unauthenticated root access to your network host="tcp://0.0.0.0:2375"
WARN[2023-12-15T21:40:42.853436601Z] You can override this by explicitly specifying '--tls=false' or '--tlsverify=false' host="tcp://0.0.0.0:2375"
WARN[2023-12-15T21:40:42.853452012Z] Support for listening on TCP without authentication or explicit intent to run without authentication will be removed in the next release host="tcp://0.0.0.0:2375"
WARN[2023-12-15T21:40:57.855024630Z] could not change group /var/run/docker.sock to docker: group docker not found
INFO[2023-12-15T21:40:57.855202154Z] containerd not running, starting managed containerd
INFO[2023-12-15T21:40:57.856178647Z] started new containerd process address=/var/run/docker/containerd/containerd.sock module=libcontainerd pid=30
INFO[2023-12-15T21:40:57.878615040Z] starting containerd revision=64b8a811b07ba6288238eefc14d898ee0b5b99ba version=v1.7.11
INFO[2023-12-15T21:40:57.907497173Z] loading plugin "io.containerd.event.v1.exchange"... type=io.containerd.event.v1
INFO[2023-12-15T21:40:57.907547544Z] loading plugin "io.containerd.internal.v1.opt"... type=io.containerd.internal.v1
INFO[2023-12-15T21:40:57.907792350Z] loading plugin "io.containerd.warning.v1.deprecations"... type=io.containerd.warning.v1
INFO[2023-12-15T21:40:57.907817501Z] loading plugin "io.containerd.snapshotter.v1.blockfile"... type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.907885642Z] skip loading plugin "io.containerd.snapshotter.v1.blockfile"... error="no scratch file generator: skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.907906552Z] loading plugin "io.containerd.snapshotter.v1.devmapper"... type=io.containerd.snapshotter.v1
WARN[2023-12-15T21:40:57.907922823Z] failed to load plugin io.containerd.snapshotter.v1.devmapper error="devmapper not configured"
INFO[2023-12-15T21:40:57.907934383Z] loading plugin "io.containerd.snapshotter.v1.native"... type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.908092727Z] loading plugin "io.containerd.snapshotter.v1.overlayfs"... type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.908369143Z] loading plugin "io.containerd.snapshotter.v1.aufs"... type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.913635006Z] skip loading plugin "io.containerd.snapshotter.v1.aufs"... error="aufs is not supported (modprobe aufs failed: exit status 1 \"ip: can't find device 'aufs'\\nmodprobe: can't change directory to '/lib/modules': No such file or directory\\n\"): skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.913674427Z] loading plugin "io.containerd.snapshotter.v1.zfs"... type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.913865222Z] skip loading plugin "io.containerd.snapshotter.v1.zfs"... error="path /var/lib/docker/containerd/daemon/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.containerd.snapshotter.v1
INFO[2023-12-15T21:40:57.913887322Z] loading plugin "io.containerd.content.v1.content"... type=io.containerd.content.v1
INFO[2023-12-15T21:40:57.914007495Z] loading plugin "io.containerd.metadata.v1.bolt"... type=io.containerd.metadata.v1
WARN[2023-12-15T21:40:57.914065496Z] could not use snapshotter devmapper in metadata plugin error="devmapper not configured"
INFO[2023-12-15T21:40:57.914084066Z] metadata content store policy set policy=shared
INFO[2023-12-15T21:40:57.918482609Z] loading plugin "io.containerd.gc.v1.scheduler"... type=io.containerd.gc.v1
INFO[2023-12-15T21:40:57.918562331Z] loading plugin "io.containerd.differ.v1.walking"... type=io.containerd.differ.v1
INFO[2023-12-15T21:40:57.918597992Z] loading plugin "io.containerd.lease.v1.manager"... type=io.containerd.lease.v1
INFO[2023-12-15T21:40:57.918619863Z] loading plugin "io.containerd.streaming.v1.manager"... type=io.containerd.streaming.v1
INFO[2023-12-15T21:40:57.918644253Z] loading plugin "io.containerd.runtime.v1.linux"... type=io.containerd.runtime.v1
INFO[2023-12-15T21:40:57.918811847Z] loading plugin "io.containerd.monitor.v1.cgroups"... type=io.containerd.monitor.v1
INFO[2023-12-15T21:40:57.919273038Z] loading plugin "io.containerd.runtime.v2.task"... type=io.containerd.runtime.v2
INFO[2023-12-15T21:40:57.919452991Z] loading plugin "io.containerd.runtime.v2.shim"... type=io.containerd.runtime.v2
INFO[2023-12-15T21:40:57.919484743Z] loading plugin "io.containerd.sandbox.store.v1.local"... type=io.containerd.sandbox.store.v1
INFO[2023-12-15T21:40:57.919507374Z] loading plugin "io.containerd.sandbox.controller.v1.local"... type=io.containerd.sandbox.controller.v1
INFO[2023-12-15T21:40:57.919525624Z] loading plugin "io.containerd.service.v1.containers-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919545674Z] loading plugin "io.containerd.service.v1.content-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919561565Z] loading plugin "io.containerd.service.v1.diff-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919578625Z] loading plugin "io.containerd.service.v1.images-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919596045Z] loading plugin "io.containerd.service.v1.introspection-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919611926Z] loading plugin "io.containerd.service.v1.namespaces-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919631026Z] loading plugin "io.containerd.service.v1.snapshots-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919645546Z] loading plugin "io.containerd.service.v1.tasks-service"... type=io.containerd.service.v1
INFO[2023-12-15T21:40:57.919677087Z] loading plugin "io.containerd.grpc.v1.containers"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919696098Z] loading plugin "io.containerd.grpc.v1.content"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919712028Z] loading plugin "io.containerd.grpc.v1.diff"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919727758Z] loading plugin "io.containerd.grpc.v1.events"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919743909Z] loading plugin "io.containerd.grpc.v1.images"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919762479Z] loading plugin "io.containerd.grpc.v1.introspection"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919778019Z] loading plugin "io.containerd.grpc.v1.leases"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919794330Z] loading plugin "io.containerd.grpc.v1.namespaces"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919809610Z] loading plugin "io.containerd.grpc.v1.sandbox-controllers"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919826850Z] loading plugin "io.containerd.grpc.v1.sandboxes"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919840961Z] loading plugin "io.containerd.grpc.v1.snapshots"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919856751Z] loading plugin "io.containerd.grpc.v1.streaming"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919872311Z] loading plugin "io.containerd.grpc.v1.tasks"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919891292Z] loading plugin "io.containerd.transfer.v1.local"... type=io.containerd.transfer.v1
INFO[2023-12-15T21:40:57.919920563Z] loading plugin "io.containerd.grpc.v1.transfer"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919935374Z] loading plugin "io.containerd.grpc.v1.version"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.919949474Z] loading plugin "io.containerd.internal.v1.restart"... type=io.containerd.internal.v1
INFO[2023-12-15T21:40:57.920042796Z] loading plugin "io.containerd.tracing.processor.v1.otlp"... type=io.containerd.tracing.processor.v1
INFO[2023-12-15T21:40:57.920069286Z] skip loading plugin "io.containerd.tracing.processor.v1.otlp"... error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
INFO[2023-12-15T21:40:57.920116297Z] loading plugin "io.containerd.internal.v1.tracing"... type=io.containerd.internal.v1
INFO[2023-12-15T21:40:57.920132368Z] skipping tracing processor initialization (no tracing plugin) error="no OpenTelemetry endpoint: skip plugin"
INFO[2023-12-15T21:40:57.920278081Z] loading plugin "io.containerd.grpc.v1.healthcheck"... type=io.containerd.grpc.v1
INFO[2023-12-15T21:40:57.920298541Z] loading plugin "io.containerd.nri.v1.nri"... type=io.containerd.nri.v1
INFO[2023-12-15T21:40:57.920314162Z] NRI interface is disabled by configuration.
INFO[2023-12-15T21:40:57.920648640Z] serving... address=/var/run/docker/containerd/containerd-debug.sock
INFO[2023-12-15T21:40:57.920785164Z] serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
INFO[2023-12-15T21:40:57.920911746Z] serving... address=/var/run/docker/containerd/containerd.sock
INFO[2023-12-15T21:40:57.920997008Z] containerd successfully booted in 0.043434s
INFO[2023-12-15T21:40:58.900983259Z] Loading containers: start.
WARN[2023-12-15T21:40:58.935706389Z] Running modprobe bridge br_netfilter failed with message: ip: can't find device 'bridge'
bridge 151336 1 br_netfilter
stp 12976 1 bridge
llc 14552 2 bridge,stp
ip: can't find device 'br_netfilter'
br_netfilter 22256 0
bridge 151336 1 br_netfilter
modprobe: can't change directory to '/lib/modules': No such file or directory
, error: exit status 1
INFO[2023-12-15T21:40:59.498230956Z] Loading containers: done.
WARN[2023-12-15T21:40:59.513887071Z] WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/go/attack-surface/
WARN[2023-12-15T21:40:59.513914842Z] WARNING: bridge-nf-call-iptables is disabled
WARN[2023-12-15T21:40:59.513922842Z] WARNING: bridge-nf-call-ip6tables is disabled
INFO[2023-12-15T21:40:59.513943883Z] Docker daemon commit=92884c2 graphdriver=overlay2 version=25.0.0-beta.2
INFO[2023-12-15T21:40:59.514081036Z] Daemon has completed initialization
INFO[2023-12-15T21:40:59.546091982Z] API listen on /var/run/docker.sock
INFO[2023-12-15T21:40:59.546096892Z] API listen on [::]:2375
time="2023-12-15T21:41:06.000786844Z" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2023-12-15T21:41:06.000902046Z" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2023-12-15T21:41:06.000920607Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2023-12-15T21:41:06.001128371Z" level=info msg="loading plugin \"io.containerd.ttrpc.v1.pause\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
INFO[2023-12-15T21:42:36.437838558Z] ignoring event container=31ca0689fcf6bc80aa7960629fe2a4767bf71342a08cdbbf01bcc0c2516a6cf9 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
INFO[2023-12-15T21:42:36.438047554Z] shim disconnected id=31ca0689fcf6bc80aa7960629fe2a4767bf71342a08cdbbf01bcc0c2516a6cf9 namespace=moby
WARN[2023-12-15T21:42:36.438162196Z] cleaning up after shim disconnected id=31ca0689fcf6bc80aa7960629fe2a4767bf71342a08cdbbf01bcc0c2516a6cf9 namespace=moby
INFO[2023-12-15T21:42:36.438182847Z] cleaning up dead shim namespace=moby
WARN[2023-12-15T21:42:36.456385311Z] failed to close stdin: task 31ca0689fcf6bc80aa7960629fe2a4767bf71342a08cdbbf01bcc0c2516a6cf9 not found: not found
^CINFO[2023-12-15T21:42:38.223741354Z] Processing signal 'interrupt'
INFO[2023-12-15T21:42:38.225153157Z] stopping event stream following graceful shutdown error="<nil>" module=libcontainerd namespace=moby
INFO[2023-12-15T21:42:38.225646768Z] Daemon shutdown complete
INFO[2023-12-15T21:42:38.225722230Z] stopping healthcheck following graceful shutdown module=libcontainerd
INFO[2023-12-15T21:42:38.225751830Z] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
I believe the bridge
warning to be a red-herring as I see that in docker:24.0.6-dind
which works.
I then run a debian
container and try a apt-get update
docker run --rm -ti debian:latest
Unable to find image 'debian:latest' locally
latest: Pulling from library/debian
90e5e7d8b87a: Pull complete
Digest: sha256:133a1f2aa9e55d1c93d0ae1aaa7b94fb141265d0ee3ea677175cdb96f5f990e5
Status: Downloaded newer image for debian:latest
root@31ca0689fcf6:/# apt-get update
Ign:1 http://deb.debian.org/debian bookworm InRelease
Ign:2 http://deb.debian.org/debian bookworm-updates InRelease
Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
Ign:1 http://deb.debian.org/debian bookworm InRelease
Ign:2 http://deb.debian.org/debian bookworm-updates InRelease
Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
Ign:1 http://deb.debian.org/debian bookworm InRelease
Ign:2 http://deb.debian.org/debian bookworm-updates InRelease
Ign:3 http://deb.debian.org/debian-security bookworm-security InRelease
Err:1 http://deb.debian.org/debian bookworm InRelease
Temporary failure resolving 'deb.debian.org'
Err:2 http://deb.debian.org/debian bookworm-updates InRelease
Temporary failure resolving 'deb.debian.org'
Err:3 http://deb.debian.org/debian-security bookworm-security InRelease
Temporary failure resolving 'deb.debian.org'
Reading package lists... Done
W: Failed to fetch http://deb.debian.org/debian/dists/bookworm/InRelease Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://deb.debian.org/debian/dists/bookworm-updates/InRelease Temporary failure resolving 'deb.debian.org'
W: Failed to fetch http://deb.debian.org/debian-security/dists/bookworm-security/InRelease Temporary failure resolving 'deb.debian.org'
W: Some index files failed to download. They have been ignored, or old ones used instead.
Network works if I run the inner container with --net=host
.
Host info:
- CentOS 7
- Docker version 24.0.4, build 3713ee1
- Kernel: 3.10.0-1160.15.2.el7.x86_64
Hmm, those apt-get update
errors sound suspiciously like seccomp failures -- any way you could get an aggressively newer version of libseccomp2
on your host and try again (or try --security-opt seccomp=unconfined
on your debian
container)?
(I'm not sure how libseccomp2
versions interact via Docker-in-Docker -- I knew at one point but the knowledge has left me. ๐ญ)
docker run --rm -ti --security-opt seccomp=unconfined debian:latest
behaves the same way. I'll have to set up a test bed to test upgrading host libraries, the host I'm reproducing this on is an active node so I can't fiddle too much there. I can put come cycles into that next week.
The version of libseccomp on the troubled host:
libseccomp.x86_64 2.3.1-4.el7 @centos7-x86_64-os
libseccomp.i686 2.3.1-4.el7 centos7-x86_64-os
libseccomp-devel.i686 2.3.1-4.el7 centos7-x86_64-os
libseccomp-devel.x86_64 2.3.1-4.el7 centos7-x86_64-os
Yeah, thanks for testing -- it's probably not libseccomp
then ๐
My best guess now is that the CentOS 7 kernel supports nf_tables
, but maybe it wasn't fully/completely backported to that kernel and thus doesn't work in a network namespace or something?
Also, to be clear, it's not just DNS that is failing. I'm also seeing ICMP and TCP failures.
docker run --rm -ti jonlabelle/network-tools
[network-tools]$ curl http://142.250.191.238 -ILX GET --connect-timeout 5
curl: (28) Failed to connect to 142.250.191.238 port 80 after 5001 ms: Timeout was reached
[network-tools]$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 7999ms
We are also experiencing this problem which makes all our CI build jobs failed. Changing all tags from latest to the previous version on hundred of repos is not that ideal :(
Can you run the following one-liner on affected infrastructure and provide the full output?
docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
It should look something like this:
$ docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ iptables -nL
+ echo success nftables
success nftables
or:
$ docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! false iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ iptables -nL
+ modprobe ip_tables
ip: can't find device 'ip_tables'
ip_tables 36864 0
x_tables 53248 8 ip_tables,xt_mark,xt_nat,xt_tcpudp,xt_conntrack,xt_MASQUERADE,xt_addrtype,nft_compat
modprobe: can't change directory to '/lib/modules': No such file or directory
+ :
+ /usr/local/sbin/.iptables-legacy/iptables -nL
+ echo success legacy
success legacy
# docker images | grep 'docker[[:space:]]\+dind[[:space:]]\+'
docker dind 6091c7bd89fd 3 days ago 331MB
# docker run -it --rm --privileged docker:dind sh -euxc 'modprobe nf_tables > /dev/null 2>&1 || :; if ! iptables -nL > /dev/null 2>&1; then modprobe ip_tables || :; /usr/local/sbin/.iptables-legacy/iptables -nL > /dev/null 2>&1; echo success legacy; else echo success nftables; fi'
+ modprobe nf_tables
+ :
+ iptables -nL
+ echo success nftables
success nftables
Any chance you could test #468? ๐
docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind'
Any chance you could test #468? ๐
docker build --pull 'https://github.com/docker-library/docker.git#refs/pull/468/merge:24/dind'
This did not resolve it, unfortunately. Still no network inside the inner containers.
That's absolutely flabbergasting. ๐ญ
I spun up my own CentOS 7 instance to try and debug further, and I managed to replicate immediately. What I've found is that the host is definitely still using the legacy iptables/xtables, and we have zero means of detecting that reliably inside the container that I've found so far. So, as far as I can tell, there's something deficient in either the network namespaces or nf_tables
implementations in that CentOS kernel.
The best I've come up with is checking whether ip_tables
is loaded and that nf_tables
is not (which means if you've run the current version of the container that's loading nf_tables
, it'll be unable to detect correctly until you reboot or unload that module). This is pretty fragile, but it's really the best I can think of.
Is this something we could allow the user to specify via a config ENV var? I also wonder how much of a concern this needs to be, as CentOS 7 goes EOL at the end of June. If this is a pain for maintenance I think a documented config setting for near end of life / past end of life setups is reasonable.
Ok, #468 is probably on hold until the new year (#468 (comment)), but here are some workarounds if you need to fix this before we can resolve that:
FROM docker:dind
ENV PATH /usr/local/sbin/.iptables-legacy:$PATH
or:
docker run ... --env PATH='/usr/local/sbin/.iptables-legacy:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin' ...