kubernetes/kops

Provisioning a cluster on Hetzner with debian 12 images fails

Closed this issue · 0 comments

/kind bug

1. What kops version are you running? The command kops version, will display
this information.

1.28.4

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: v1.28.1
Server Version: v1.28.6

3. What cloud provider are you using?

Hetzner

4. What commands did you run? What is the simplest way to reproduce this issue?

kops create cluster --name=test-cluster
--ssh-public-key=~/.ssh/kops_rsa.pub --cloud=hetzner --node-count=3 --zones=nbg1
--image=debian-12 --control-plane-count=3 --networking=cilium --network-cidr=10.10.0.0/16
--node-size cx21

5. What happened after the commands executed?

The command ended successfully, however the control plane was not able to start properly. kubelet service was failing repeatedly with the following message:

Could not open resolv conf file." err="open /run/systemd/resolve/resolv.conf: no such file or directory

6. What did you expect to happen?

I expect the control planes to start, so that the cluster can be used.

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2024-04-17T14:03:00Z"
  name: test-cluster
spec:
  api:
    loadBalancer:
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudProvider: hetzner
  configBase: s3://test-cluster/test-cluster
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - instanceGroup: control-plane-nbg1-1
      name: etcd-1
    - instanceGroup: control-plane-nbg1-2
      name: etcd-2
    - instanceGroup: control-plane-nbg1-3
      name: etcd-3
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - instanceGroup: control-plane-nbg1-1
      name: etcd-1
    - instanceGroup: control-plane-nbg1-2
      name: etcd-2
    - instanceGroup: control-plane-nbg1-3
      name: etcd-3
    manager:
      backupRetentionDays: 90
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
  kubernetesApiAccess:
  - 0.0.0.0/0
  - ::/0
  kubernetesVersion: 1.28.6
  networkCIDR: 10.10.0.0/16
  networking:
    cilium:
      enableNodePort: true
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  - ::/0
  subnets:
  - name: nbg1
    type: Public
    zone: nbg1
  topology:
    dns:
      type: None

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-17T14:03:00Z"
  labels:
    kops.k8s.io/cluster: test-cluster
  name: control-plane-nbg1-1
spec:
  image: debian-12
  machineType: cx21
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - nbg1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-17T14:03:00Z"
  labels:
    kops.k8s.io/cluster: test-cluster
  name: control-plane-nbg1-2
spec:
  image: debian-12
  machineType: cx21
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - nbg1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-17T14:03:00Z"
  labels:
    kops.k8s.io/cluster: test-cluster
  name: control-plane-nbg1-3
spec:
  image: debian-12
  machineType: cx21
  maxSize: 1
  minSize: 1
  role: Master
  subnets:
  - nbg1

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2024-04-17T14:03:00Z"
  labels:
    kops.k8s.io/cluster: test-cluster
  name: nodes-nbg1
spec:
  image: debian-12
  machineType: cx21
  maxSize: 3
  minSize: 3
  role: Node
  subnets:
  - nbg1

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

No need because this is not related to a kops command. The kops create command was successfull. The issue happens at the level of control plane nodes. An extract of the logs:

kubelet[3014]: I0414 14:42:31.100802    3014 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/kube-apiserver-control-plane-fsn1-1-636428496a76dc30"
Apr 14 14:42:31 control-plane-fsn1-1-636428496a76dc30 kubelet[3014]: E0414 14:42:31.100877    3014 dns.go:284] "Could not open resolv conf file." err="open /run/systemd/resolve/resolv.conf: no such file or directory"
Apr 14 14:42:31 control-plane-fsn1-1-636428496a76dc30 kubelet[3014]: E0414 14:42:31.100911    3014 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-apiserver-control-plane-fsn1-1-636428496a76dc30"
Apr 14 14:42:31 control-plane-fsn1-1-636428496a76dc30 kubelet[3014]: E0414 14:42:31.100947    3014 kuberuntime_manager.go:1171] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube

9. Anything else do we need to know?

Maybe this check needs to be adjusted here

func (d *Distribution) HasLoopbackEtcResolvConf() bool {