kubernetes-csi/csi-driver-nfs

GKE 1.24 mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known

igoooor opened this issue · 1 comments

What happened:
I'm using a cluster from GCP, kubernetes version 1.24, I installed and followed the example from the repository.
The pod mounting the nfs PVC can not start and here is the output:

Warning  FailedMount        1s (x2 over 1s)  kubelet             MountVolume.SetUp failed for volume "pv-nginx" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o nfsvers=4.1 nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/c69f5a0b-2cf5-47d6-bf69-6f9e66eb1435/volumes/kubernetes.io~csi/pv-nginx/mount
Output: mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known

What you expected to happen:
Volume to be mounted.

How to reproduce it:

Anything else we need to know?:
Before trying csi-driver, I used another approach to nfs (one example described here: https://medium.com/platformer-blog/nfs-persistent-volumes-with-kubernetes-a-case-study-ce1ed6e2c266)
The method from the medium post used to work until randomly failing today morning. I noticed that my nodes have been updated over night, but I'm not sure what the update was about.
Since today, both csi-driver and the other method (from medium) is failing with the same error. In the NFS PVC If I replace the DNS name by the cluster IP of the nfs server's service, then it works.
I don't know if I can say anything else about it, but I will be happy to try anything

Environment:

  • CSI Driver version: 4.0
  • Kubernetes version (use kubectl version): v1.24.1-gke.1400
  • OS (e.g. from /etc/os-release): Container-optimised OS with containerd (cos_containerd)
  • Kernel (e.g. uname -a):
  • Install tools: helm
  • Others:

So I don't think my issue was related to csi-driver, but I managed to build a solution.
Looking at these two links:
https://cloud.google.com/solutions/automatically-bootstrapping-gke-nodes-with-daemonsets
https://github.com/GoogleCloudPlatform/solutions-gke-init-daemonsets-tutorial
I built the following manifest:

apiVersion: v1
kind: ConfigMap
metadata:
  name: entrypoint
  namespace: node-initializer
  labels:
    app: default-init
data:
  entrypoint.sh: |
    #!/usr/bin/env bash
    set -euo pipefail
    
    sleep 30

    ROOT_MOUNT_DIR="${ROOT_MOUNT_DIR:-/root}"
      
    echo "Installing dependencies"
    apt-get update
    apt-get install -y curl jq
    
    KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
    KUBE_DNS_IP=$(curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/kube-system/services/kube-dns | jq -r '.spec.clusterIP')

    if ! cat /root/etc/systemd/resolved.conf | grep $KUBE_DNS_IP ; then
        echo "DNS=${KUBE_DNS_IP}" >> /root/etc/systemd/resolved.conf
        chroot "${ROOT_MOUNT_DIR}" systemctl daemon-reload
        chroot "${ROOT_MOUNT_DIR}" systemctl restart systemd-networkd
        chroot "${ROOT_MOUNT_DIR}" systemctl restart systemd-resolved
    fi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: service-reader
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "watch", "list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: node-initializer
  namespace: node-initializer
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: read-services
subjects:
  - kind: ServiceAccount
    name: node-initializer
    namespace: node-initializer
roleRef:
  kind: ClusterRole
  name: service-reader
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-initializer
  namespace: node-initializer
  labels:
    app: default-init
spec:
  selector:
    matchLabels:
      app: default-init
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: node-initializer
        namespace: node-initializer
        app: default-init
    spec:
      automountServiceAccountToken: true
      serviceAccountName: node-initializer
      hostNetwork: true
      hostPID: true
      enableServiceLinks: true
      volumes:
        - name: root-mount
          hostPath:
            path: /
        - name: entrypoint
          configMap:
            name: entrypoint
            defaultMode: 0744
      initContainers:
        - image: ubuntu:18.04
          name: node-initializer
          command: ["/scripts/entrypoint.sh"]
          env:
            - name: ROOT_MOUNT_DIR
              value: /root
          securityContext:
            privileged: true
          volumeMounts:
            - name: root-mount
              mountPath: /root
            - name: entrypoint
              mountPath: /scripts
      containers:
        - image: "gcr.io/google-containers/pause:2.0"
          name: pause

This allows me to add the kube-dns into the node, which was missing. Now my service dns names will resolve from the nodes.
I hope this can help other people