Azure/custom-script-extension-linux

Forced IPv6 DNS resolution, even if IPv6 is disabled on Linux box

marinnedea opened this issue · 3 comments

Disabling IPv6 on a Linux box triggers a strange behavior when trying to run the Custom Script Extension for Linux:

##[error]VMExtensionProvisioningError: VM has reported a failure when processing extension 'CustomScript'.Error message: 'Enable failed: processing file downloads failed: failed to download file[0]: failed to download file: http request failed: Get [REDACTED] dial tcp: lookup redacted.blob.core.windows.net on [::1]:53: dial udp [::1]:53: socket: address family not supported by protocol'

Basically the extension tries to download the script forcing an IPv6 DNS resolution, although IPv6 is fully disabled on the Linux Box.

Further digging into the problem, I found this to be related to a default behavior in GoLang itself:
The following code seems to force the fallback on IPv6 if the file /etc/resolv.conf cannot be read:

var (
    defaultNS   = []string{"127.0.0.1:53", "[::1]:53"}
    getHostname = os.Hostname // variable for testing
)

Since I'm not a GoLang developer/programmer, I do not pretend to fully understand what's going on there, but on 1st look it seems the whole implementation is wrong, as:

  • it doesn't check if the IPv6 protocol is enabled before switching to it
  • it doesn't check if /etc/resolv.conf is a link to a different file (maybe this cause the failure in correctly detecting the file config and to keep going with IPv4, as it should)
  • it should have as 1st option to rely on system defaults.. doesn't feel like this happens here.

OS where this happens:
Barracuda NG Control Center (underlaying Linux is CentOs 7.x )

Bellow information are taken from an affected VM, and as you can see the IPv6 is not in effect there:

~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 3500 qdisc noqueue state UNKNOWN group default qlen 1000
	link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
	inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
	   valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
	link/ether 00:0d:3a:22:53:53 brd ff:ff:ff:ff:ff:ff
	inet 10.0.60.241/22 brd 10.0.63.255 scope global eth0
	   valid_lft forever preferred_lft forever

~]# cat /etc/hosts
[root@itweup01cccvm01:~]# cat /etc/hosts

#/etc/hosts
#created: 1602848252
#do not edit manually all changes will be lost

127.0.0.1           	localhost
10.0.60.241        		itweup01cccvm01 itweup01cccvm01.acld99.corp myboxip
#Additional hosts

~]# cat /etc/sysctl.d/40-ipv6.conf
~]# cat: /etc/sysctl.d/40-ipv6.conf: No such file or directory

~]# ls -ltr /etc/resolv.conf
lrwxrwxrwx 1 root root 22 Oct 19 14:38 /etc/resolv.conf -> /etc/resolv.conf.wbdns

~]# cat /etc/resolv.conf
#/etc/resolv.conf
#do not edit manually all changes will be lost
search acld99.corp
nameserver 10.0.60.4
nameserver 10.0.60.5
options rotate
options timeout:5

Of course, enabling IPv6 fixes the issue, but the extension should work regardless which protocol is used on the machine.
Please have this fixed into the extension.
Thank you!

There is maybe more to it, just out of curiosity I tried to replicate this using a RHEL8 image, I've disabled IPv6 and even deleted /etc/resolv.conf , but still the extension is working.

Are you able to replicate it on a standard CentOs/RHEL image ?

There is maybe more to it, just out of curiosity I tried to replicate this using a RHEL8 image, I've disabled IPv6 and even deleted /etc/resolv.conf , but still the extension is working.

Are you able to replicate it on a standard CentOs/RHEL image ?

Not really. It only happens on Barracuda, so I'm starting to think there's more to the way DNS works on Barracuda.
On CentOS 7.4 I was able to reach to a point where the script doesn't get downloaded.. but I found nothing in the logs related to IPv6 when that happens ..well.. as a matter of fact.. nothing get's actually logged.

It does work however if you pass the download part to the "CommandToRun" .. and ignore the FileUri parameter completely.
And that tells me the DNS on the machine works as designed, except when the script download is triggered through the extension "FileUri"

E.G. Instead of:

{
“fileUris": ["https://raw.githubusercontent.com/Microsoft/dotnet-core-sample-templates/master/dotnet-core-music-linux/scripts/config-music.sh"],
  "commandToExecute": "./config-music.sh"
}

To have something like:

{
  "commandToExecute": "wget  https://raw.githubusercontent.com/Microsoft/dotnet-core-sample-templates/master/dotnet-core-music-linux/scripts/config-music.sh &&  chmod +x config-music.sh  && ./config-music.sh  "
}                                                            

(Used https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/custom-script-linux#azure-cli for the sample code above.)

Found the problem:
WaLinuxAgent runs from a chrooted/jailed environment which is set in /chroot/waagent directory.
There was no /chroot/waagent/etc/resolv.conf file present there.
So.. the fix was as simple as: cp -p /etc/resolv.conf /chroot/waagent/etc/

NOTE:
The above applies to Barracuda instances. I have no idea how other appliances are handling this.