lxc/distrobuilder

debootstrap: DNS resolution failure

kaimaera opened this issue · 1 comments

This one blows my mind...

I do not know what to do with this. I am reporting the problem here as there is a workaround that could be implemented in distrobuilder, although whether it is a good thing to do or not is debatable.

On Ubuntu 22.04 (fully up to date) and using distrobuilder-3.0:

$ sudo $GOBIN/distrobuilder build-incus build.yaml
INFO   [2024-02-01T22:04:25Z] Downloading source
... debootstrap output omitted (all good) ...
INFO   [2024-02-01T22:06:06Z] Managing repositories                        
INFO   [2024-02-01T22:06:06Z] Running hooks                                 trigger=post-unpack
INFO   [2024-02-01T22:06:06Z] Managing packages                            
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://security.ubuntu.com/ubuntu jammy-security InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://security.ubuntu.com/ubuntu jammy-security InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Ign:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:2 http://security.ubuntu.com/ubuntu jammy-security InRelease
Ign:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Err:1 http://archive.ubuntu.com/ubuntu jammy InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:2 http://security.ubuntu.com/ubuntu jammy-security InRelease
  Temporary failure resolving 'security.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Reading package lists... Done
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/jammy-updates/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease  Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.

The build machine is configured the systemd-resolved stub (/etc/resolv.conf -> /run/systemd/resolve/stub-resolv.conf):

nameserver 127.0.0.53
options edns0 trust-ad
search Home

I am using a build spec derived lxc-ci, so I am pretty confident that is not the issue.

I have spent many frustrating hours trying to understand what is going on. The chroot environment setup by distrobuilder in which to run apt, after debootstrap has setup the rootfs, mounts proc and sysfs as usual, uses tmpfs for /run and /dev (the latter gets populated with a minimal set of devices) and binds /etc/resolv.conf into the rootfs (the implementation is pretty nifty, I have to say). I have confirmed that it does the right thing and that the mounts and resolver configuration are valid when inside the chroot. So it is not an issue with distrobuilder directly...

Setting up the chroot manually (using --cleanup=false and --cache-dir then applying the same mounts), I was able to reproduce the issue with apt update. I had a surprise when checking connectivity with ping: not only did ping work fine, it did not have any trouble resolving DNS names (and in particular it could reach archive.ubuntu.com). So not a systemd-resolved issue (which, whilst researching the problem, I had started to develop prejudice against)...

So that leaves apt (or the libraries it uses for DNS resolution). I noticed two things:

  • The issue is not solved replacing the stub configuration with a real one (e.g. using nameserver 8.8.8.8). To be certain, I traced the apt syscalls with both configurations (stub and upstream) and the behaviour was the same:
connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}
  • However, apt is able to resolve again when /run/systemd is bound to the chroot (I tried with /run/systemd/resolve but that was not enough); this perhaps allows it to use one of the systemd sockets instead of UDP queries (wild speculation). Anyway, if the problem is experienced by a wider audience, this is a the basis of workaround that distrobuilder could implement when it detects the corresponding /etc/resolv.conf redirection. There are no doubt security implications though; but it could be provided as an explicit option without appropriate warnings.

If feedback is positive, I would be happy to contribute some commits... Then again, it could be that my build server OS configuration is toast.

This was bugging me. Checked the APT source, it defaults to using getaddrinfo, so libc/nss. It just had to work...
Then it struck me: APT uses a non-root sandbox user. So a permissions issue then?
Sure enough, root umask on the build system was 077.
sigh (polite version of my reaction).
It's outright embarrassing.