kubernetes/kops

Inconsistencies between qualified names on AWS nodes

rifelpet opened this issue · 2 comments

/kind bug
/kind failing-test

Our grid jobs for RHEL-based distros are failing a test that was recently unskipped for unrelated reasons (#16176)

https://testgrid.k8s.io/kops-grid#kops-grid-cilium-amzn2-k28

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/e2e-kops-grid-cilium-amzn2-k28/1756260864102502400

[FAIL] [sig-network] Networking Granular Checks: Services [It] should function for service endpoints using hostNetwork 
   [FAILED] failed dialing endpoint, did not find expected responses... 
  Tries 46
  Command curl -g -q -s 'http://100.96.4.93:9080/dial?request=hostname&protocol=http&host=100.66.81.213&port=80&tries=1'
  retrieved map[i-03b17693021906ac2.eu-west-1.compute.internal:{} i-03fbc6f079db37ce7.eu-west-1.compute.internal:{} i-0c90e87e766b90952.eu-west-1.compute.internal:{} i-0fd0d694876a8befc.eu-west-1.compute.internal:{}]
  expected map[i-03b17693021906ac2:{} i-03fbc6f079db37ce7:{} i-0c90e87e766b90952:{} i-0fd0d694876a8befc:{}]

This test expects unqualified names but is actually receiving fully qualified names. The test code's expected data comes from the kubernetes.io/hostname label on nodes (also the node name itself) which we see is the unqualified instance ID.

The test's actual data comes from running the hostname command on a hostNetwork pod.

A list of our distros and whether hostname returns a fully qualified name:

  • AL 2 - yes
  • AL 2023 - yes
  • Debian 10 - no
  • Debian 12 - no
  • Flatcar - no
  • RHEL 8 - yes
  • RHEL 9 - yes
  • Rocky 8 - yes
  • Ubuntu 20.04 - no
  • Ubuntu 22.04 - no

I think our best path forward would be to configure the RHEL-based distros to return the unqualified name for hostname. This would match behavior with the other distros.

Alternatively we could make all node names fully qualified like i-03fbc6f079db37ce7.eu-west-1.compute.internal but this feels more disruptive.

This relates to kubernetes/kubernetes#121018 and the e2e test logic could be updated to handle either qualified or unqualified hostname outputs.

/kind office-hours