GKE 1.24 mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known
igoooor opened this issue · 1 comments
What happened:
I'm using a cluster from GCP, kubernetes version 1.24, I installed and followed the example from the repository.
The pod mounting the nfs PVC can not start and here is the output:
Warning FailedMount 1s (x2 over 1s) kubelet MountVolume.SetUp failed for volume "pv-nginx" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs -o nfsvers=4.1 nfs-server.default.svc.cluster.local:/ /var/lib/kubelet/pods/c69f5a0b-2cf5-47d6-bf69-6f9e66eb1435/volumes/kubernetes.io~csi/pv-nginx/mount
Output: mount.nfs: Failed to resolve server nfs-server.default.svc.cluster.local: Name or service not known
What you expected to happen:
Volume to be mounted.
How to reproduce it:
- Create a new cluster in GCP, using default settings and select kubernetes version 1.23 (or 1.24)
- Install CSI-Driver-NFS 4.0 using helm
- Follow the example https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/example/nfs-provisioner/README.md
Anything else we need to know?:
Before trying csi-driver, I used another approach to nfs (one example described here: https://medium.com/platformer-blog/nfs-persistent-volumes-with-kubernetes-a-case-study-ce1ed6e2c266)
The method from the medium post used to work until randomly failing today morning. I noticed that my nodes have been updated over night, but I'm not sure what the update was about.
Since today, both csi-driver and the other method (from medium) is failing with the same error. In the NFS PVC If I replace the DNS name by the cluster IP of the nfs server's service, then it works.
I don't know if I can say anything else about it, but I will be happy to try anything
Environment:
- CSI Driver version: 4.0
- Kubernetes version (use
kubectl version
): v1.24.1-gke.1400 - OS (e.g. from /etc/os-release): Container-optimised OS with containerd (cos_containerd)
- Kernel (e.g.
uname -a
): - Install tools: helm
- Others:
So I don't think my issue was related to csi-driver, but I managed to build a solution.
Looking at these two links:
https://cloud.google.com/solutions/automatically-bootstrapping-gke-nodes-with-daemonsets
https://github.com/GoogleCloudPlatform/solutions-gke-init-daemonsets-tutorial
I built the following manifest:
apiVersion: v1
kind: ConfigMap
metadata:
name: entrypoint
namespace: node-initializer
labels:
app: default-init
data:
entrypoint.sh: |
#!/usr/bin/env bash
set -euo pipefail
sleep 30
ROOT_MOUNT_DIR="${ROOT_MOUNT_DIR:-/root}"
echo "Installing dependencies"
apt-get update
apt-get install -y curl jq
KUBE_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
KUBE_DNS_IP=$(curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/kube-system/services/kube-dns | jq -r '.spec.clusterIP')
if ! cat /root/etc/systemd/resolved.conf | grep $KUBE_DNS_IP ; then
echo "DNS=${KUBE_DNS_IP}" >> /root/etc/systemd/resolved.conf
chroot "${ROOT_MOUNT_DIR}" systemctl daemon-reload
chroot "${ROOT_MOUNT_DIR}" systemctl restart systemd-networkd
chroot "${ROOT_MOUNT_DIR}" systemctl restart systemd-resolved
fi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: service-reader
rules:
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "watch", "list"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-initializer
namespace: node-initializer
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: read-services
subjects:
- kind: ServiceAccount
name: node-initializer
namespace: node-initializer
roleRef:
kind: ClusterRole
name: service-reader
apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-initializer
namespace: node-initializer
labels:
app: default-init
spec:
selector:
matchLabels:
app: default-init
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: node-initializer
namespace: node-initializer
app: default-init
spec:
automountServiceAccountToken: true
serviceAccountName: node-initializer
hostNetwork: true
hostPID: true
enableServiceLinks: true
volumes:
- name: root-mount
hostPath:
path: /
- name: entrypoint
configMap:
name: entrypoint
defaultMode: 0744
initContainers:
- image: ubuntu:18.04
name: node-initializer
command: ["/scripts/entrypoint.sh"]
env:
- name: ROOT_MOUNT_DIR
value: /root
securityContext:
privileged: true
volumeMounts:
- name: root-mount
mountPath: /root
- name: entrypoint
mountPath: /scripts
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
This allows me to add the kube-dns into the node, which was missing. Now my service dns names will resolve from the nodes.
I hope this can help other people