terraform-google-modules/terraform-google-network

[question] changes in fully qualified domain names

vsoch opened this issue · 9 comments

vsoch commented

Hiya! I have a few questions about changes that I've seen between this module over time. I can't pinpoint the date exactly, but I'd say I tried (mostly) the same deployment a few months apart and saw the following differences:

Fully Qualified Domain Names

By default, the hostname that came up used to be of the format gffw-compute-a-001 and now have a suffix gffw-compute-a-001.c.llnl-flux.internal. Could that be a setting here?

Network name

Before I used to have my workers ping port 8050 on eth0, but now the network seems to be called ens4. Is that linked to change here?

I'm concerned because I don't see eth0 here:

$ sudo ifconfig
ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.10.0.4  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::4001:aff:fe0a:4  prefixlen 64  scopeid 0x20<link>
        ether 42:01:0a:0a:00:04  txqueuelen 1000  (Ethernet)
        RX packets 1732  bytes 266656 (260.4 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1747  bytes 311514 (304.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 90  bytes 6400 (6.2 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 90  bytes 6400 (6.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

And reading about ens4, it's a:

ens4 is an inactive device with no NetworkManager connection profile defined.

And in practice I can start my lead broker on it, but nothing can connect to it. I think this is a bug, or possibly some configuration that is wonky so the networking is not working as it used to.

Thanks for your help! Apologies I'm not very experienced with networking but was curious. The main change I did (which possibly could have led to the above too) is switching from Rocky 8 to a Debian bookworm base.

vsoch commented

And for comparison (with the previously cached modules) I can see an etho0, the hostname is shorter, and my application works!

[sochat1_llnl_gov@gffw-compute-a-001 ~]$ flux resource list
     STATE NNODES   NCORES NODELIST
      free      3       12 gffw-compute-a-[001-003]
 allocated      0        0 
      down      0        0 
[sochat1_llnl_gov@gffw-compute-a-001 ~]$ ifconfig 
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1460
        inet 10.10.0.5  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::753f:8614:d6ea:fea2  prefixlen 64  scopeid 0x20<link>
        ether 42:01:0a:0a:00:05  txqueuelen 1000  (Ethernet)
        RX packets 25785  bytes 170143603 (162.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18977  bytes 1363020 (1.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 2  bytes 140 (140.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2  bytes 140 (140.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[sochat1_llnl_gov@gffw-compute-a-001 ~]$ hostname
gffw-compute-a-001
vsoch commented

The /etc/hosts also looks very different - here is the working setup:

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.0.5 gffw-compute-a-001.c.llnl-flux.internal gffw-compute-a-001  # Added by Google
169.254.169.254 metadata.google.internal  # Added by Google

And the broken one:

$ cat /etc/hosts
127.0.0.1	localhost
::1		localhost ip6-localhost ip6-loopback
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters
vsoch commented

okay I think bookworm might be missing network interface firmware! 😱 Testing out if I know how to install it...

vsoch commented

okay - I've tried now bullseye (debian-11) and that fixed the DNS names looking weird and the /etc/hosts, and the same is true on ubuntu, but there is absolutely no eth0 device. I don't even know how to debug this :(

This is what I'm seeing: https://twitter.com/vsoch/status/1687610567765438464

I hope you can help I'm out of ideas.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

vsoch commented

It's not stale - nobody has responded to my original issue. :(

@vsoch plz create ticket for Google Cloud Support. This is not an issue related to this module.

Thanks

I agree - there's nothing directly at the vpc network or subnet level that would control the VM hostname.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days