kubernetes-csi/csi-driver-nfs

Cannot dynamic provisioning

jackchuong opened this issue · 8 comments

Hi all,
In my environment I have Synology NAS as NFS server.
I have internal k8s cluster and nfs-subdir-external-provisioner installed , they are working fine (dynamic/static PV/PVC but don't support volumesnapshot).
Now I want to try csi-driver-nfs.
I installed CSI driver with Helm 3 , default values

helm -n nfs-provisioner list
NAME                            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                   APP VERSION
csi-driver-nfs                  nfs-provisioner 1               2023-09-25 15:26:29.58904168 +0700 +07  deployed        csi-driver-nfs-v4.4.0                   v4.4.0
nfs-provisioner-sgnnas03        nfs-provisioner 1               2023-09-07 15:45:35.266905104 +0700 +07 deployed        nfs-subdir-external-provisioner-4.0.18  4.0.2

My storageclass and volumesnapshotclass

---
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: storageclass-csi-nfs-sgnnas03
  #annotations:
  #  storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs.csi.k8s.io
parameters:
  server: "192.168.7.107"
  share: /k8s-nfs-sgnnas03
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - hard
  - nfsvers=4.1
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-nfs-snapclass
driver: nfs.csi.k8s.io
deletionPolicy: Delete

kubectl get csidrivers,sc,VolumeSnapshotClass
NAME                                                ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
csidriver.storage.k8s.io/secrets-store.csi.k8s.io   false            true             false             <unset>         false               Ephemeral    18d
csidriver.storage.k8s.io/nfs.csi.k8s.io             false            false            false             <unset>         false               Persistent   17h

NAME                                                        PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/nfs-sgnnas03 (default)          nfs-provisioner-sgnnas03   Delete          Immediate           true                   18d
storageclass.storage.k8s.io/storageclass-csi-nfs-sgnnas03   nfs.csi.k8s.io             Delete          Immediate           true                   17h

NAME                                                            DRIVER           DELETIONPOLICY   AGE
volumesnapshotclass.snapshot.storage.k8s.io/csi-nfs-snapclass   nfs.csi.k8s.io   Delete           17h

My dynamic PVC

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-test
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: storageclass-csi-nfs-sgnnas03

It's keep pending status

NAME                             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS                    AGE
persistentvolumeclaim/pvc-test   Pending                                      storageclass-csi-nfs-sgnnas03   12s

kubectl describe persistentvolumeclaim/pvc-test
Name:          pvc-test
Namespace:     default
StorageClass:  storageclass-csi-nfs-sgnnas03
Status:        Pending
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
               volume.kubernetes.io/storage-provisioner: nfs.csi.k8s.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type     Reason                Age              From                                                                Message
  ----     ------                ----             ----                                                                -------
  Normal   ExternalProvisioning  4s (x2 over 4s)  persistentvolume-controller                                         waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator
  Normal   Provisioning          1s (x3 over 4s)  nfs.csi.k8s.io_k3s-hq-master1_4caab03c-f211-44c7-bc53-50b8b7c2b9cf  External provisioner is provisioning volume for claim "default/pvc-test"
  Warning  ProvisioningFailed    1s (x3 over 4s)  nfs.csi.k8s.io_k3s-hq-master1_4caab03c-f211-44c7-bc53-50b8b7c2b9cf  failed to provision volume with StorageClass "storageclass-csi-nfs-sgnnas03": rpc error: code = Internal desc = failed to mount nfs server: rpc error: code = Internal desc = mkdir /tmp/pvc-a9d891cc-2ba9-465b-8c57-7afb61dfdf78: read-only file system

How can I troubleshoot it ? Please give me some advice, thank you very much.

why mkdir under /tmp/ (inside driver controller) would fail, have you set some permissions on the driver controller?

Internal desc = mkdir /tmp/pvc-a9d891cc-2ba9-465b-8c57-7afb61dfdf78: read-only file system

Hi @andyzhangx
I didn't set any permission on driver controller, I installed CSI driver with Helm 3 , default values
Because logs are quite long (I test some volumesnapshot & volumesnapshotcontent before) , I decide to force delete csi-nfs-controller , csi-nfs-node , snapshot-controller pods and create dynamic pvc again to get exactly logs.
But now they works fine. I can create dynamic and static pv/pvc successfully
I have only 1 concern about 1 warning on csi-nfs-controller pod

kubectl -n nfs-provisioner get pod
NAME                                                              READY   STATUS    RESTARTS   AGE
nfs-provisioner-sgnnas03-nfs-subdir-external-provisioner-8vv4gb   1/1     Running   0          18d
csi-nfs-controller-86548b95f7-z7qp2                               4/4     Running   0          40m
csi-nfs-node-66dvn                                                3/3     Running   0          40m
snapshot-controller-7c47788447-xw276                              1/1     Running   0          40m

kubectl -n nfs-provisioner describe pod csi-nfs-controller-86548b95f7-z7qp2
Name:                 csi-nfs-controller-86548b95f7-z7qp2
Namespace:            nfs-provisioner
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      csi-nfs-controller-sa
Node:                 k3s-hq-master1/192.168.7.200
Start Time:           Tue, 26 Sep 2023 09:24:29 +0700
Labels:               app=csi-nfs-controller
                      app.kubernetes.io/instance=csi-driver-nfs
                      app.kubernetes.io/managed-by=Helm
                      app.kubernetes.io/name=csi-driver-nfs
                      app.kubernetes.io/version=v4.4.0
                      helm.sh/chart=csi-driver-nfs-v4.4.0
                      pod-template-hash=86548b95f7
Annotations:          <none>
Status:               Running
SeccompProfile:       RuntimeDefault
IP:                   192.168.7.200
IPs:
  IP:           192.168.7.200
Controlled By:  ReplicaSet/csi-nfs-controller-86548b95f7
Containers:
  csi-provisioner:
    Container ID:  containerd://2db0759d03659a7d557959762dfd294b9c145d8526c3206b722db19d59ba97ea
    Image:         registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
    Image ID:      registry.k8s.io/sig-storage/csi-provisioner@sha256:d078dc174323407e8cc6f0f9abd4efaac5db27838f1564d0253d5e3233e3f17f
    Port:          <none>
    Host Port:     <none>
    Args:
      -v=2
      --csi-address=$(ADDRESS)
      --leader-election
      --leader-election-namespace=nfs-provisioner
      --extra-create-metadata=true
    State:          Running
      Started:      Tue, 26 Sep 2023 09:24:30 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  400Mi
    Requests:
      cpu:     10m
      memory:  20Mi
    Environment:
      ADDRESS:  /csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-57p5d (ro)
  csi-snapshotter:
    Container ID:  containerd://60fcab37cc0aae7757925230ca8f02481e05ae0a475ec3ac4f88a27d553a1faa
    Image:         registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2
    Image ID:      registry.k8s.io/sig-storage/csi-snapshotter@sha256:becc53e25b96573f61f7469923a92fb3e9d3a3781732159954ce0d9da07233a2
    Port:          <none>
    Host Port:     <none>
    Args:
      --v=2
      --csi-address=$(ADDRESS)
      --leader-election-namespace=nfs-provisioner
      --leader-election
    State:          Running
      Started:      Tue, 26 Sep 2023 09:24:31 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  200Mi
    Requests:
      cpu:     10m
      memory:  20Mi
    Environment:
      ADDRESS:  /csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-57p5d (ro)
  liveness-probe:
    Container ID:  containerd://858d1b86561ac48692bff7965b6f67aff5759ba9e92c68b6cbdb989dce074d70
    Image:         registry.k8s.io/sig-storage/livenessprobe:v2.10.0
    Image ID:      registry.k8s.io/sig-storage/livenessprobe@sha256:4dc0b87ccd69f9865b89234d8555d3a614ab0a16ed94a3016ffd27f8106132ce
    Port:          <none>
    Host Port:     <none>
    Args:
      --csi-address=/csi/csi.sock
      --probe-timeout=3s
      --health-port=29652
      --v=2
    State:          Running
      Started:      Tue, 26 Sep 2023 09:24:31 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  100Mi
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /csi from socket-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-57p5d (ro)
  nfs:
    Container ID:  containerd://a501d4a91b904e5572972862054e400e2fd43533de77ff8a9f44d0b44bf9841b
    Image:         registry.k8s.io/sig-storage/nfsplugin:v4.4.0
    Image ID:      registry.k8s.io/sig-storage/nfsplugin@sha256:971cb526bc0108cca7d8efda32102629cb362f99bbd4879fd381bea1b2014ab4
    Port:          29652/TCP
    Host Port:     29652/TCP
    Args:
      --v=5
      --nodeid=$(NODE_ID)
      --endpoint=$(CSI_ENDPOINT)
      --drivername=nfs.csi.k8s.io
      --mount-permissions=0
      --working-mount-dir=/tmp
      --default-ondelete-policy=delete
    State:          Running
      Started:      Tue, 26 Sep 2023 09:24:32 +0700
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  200Mi
    Requests:
      cpu:     10m
      memory:  20Mi
    Liveness:  http-get http://:healthz/healthz delay=30s timeout=10s period=30s #success=1 #failure=5
    Environment:
      NODE_ID:        (v1:spec.nodeName)
      CSI_ENDPOINT:  unix:///csi/csi.sock
    Mounts:
      /csi from socket-dir (rw)
      /tmp from tmp-dir (rw)
      /var/lib/kubelet/pods from pods-mount-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-57p5d (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  pods-mount-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:  Directory
  socket-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-57p5d:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/controlplane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  40m   default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports..
  Normal   Scheduled         40m   default-scheduler  Successfully assigned nfs-provisioner/csi-nfs-controller-86548b95f7-z7qp2 to k3s-hq-master1
  Normal   Pulled            40m   kubelet            Container image "registry.k8s.io/sig-storage/csi-provisioner:v3.5.0" already present on machine
  Normal   Created           40m   kubelet            Created container csi-provisioner
  Normal   Started           40m   kubelet            Started container csi-provisioner
  Normal   Pulled            40m   kubelet            Container image "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.2" already present on machine
  Normal   Created           40m   kubelet            Created container csi-snapshotter
  Normal   Started           40m   kubelet            Started container csi-snapshotter
  Normal   Pulled            40m   kubelet            Container image "registry.k8s.io/sig-storage/livenessprobe:v2.10.0" already present on machine
  Normal   Created           40m   kubelet            Created container liveness-probe
  Normal   Started           40m   kubelet            Started container liveness-probe
  Normal   Pulled            40m   kubelet            Container image "registry.k8s.io/sig-storage/nfsplugin:v4.4.0" already present on machine
  Normal   Created           40m   kubelet            Created container nfs
  Normal   Started           40m   kubelet            Started container nfs

This is csi-nfs-controller.log.

I0926 02:24:30.726169       1 feature_gate.go:249] feature gates: &{map[]}
I0926 02:24:30.726366       1 csi-provisioner.go:154] Version: v3.5.0
I0926 02:24:30.726380       1 csi-provisioner.go:177] Building kube configs for running in cluster...
I0926 02:24:32.833655       1 common.go:111] Probing CSI driver for readiness
I0926 02:24:32.840484       1 csi-provisioner.go:230] Detected CSI driver nfs.csi.k8s.io
I0926 02:24:32.843864       1 csi-provisioner.go:302] CSI driver does not support PUBLISH_UNPUBLISH_VOLUME, not watching VolumeAttachments
I0926 02:24:32.845873       1 controller.go:732] Using saving PVs to API server in background
I0926 02:24:32.846938       1 leaderelection.go:245] attempting to acquire leader lease nfs-provisioner/nfs-csi-k8s-io...
I0926 02:24:52.779355       1 leaderelection.go:255] successfully acquired lease nfs-provisioner/nfs-csi-k8s-io
I0926 02:24:52.779803       1 leader_election.go:178] became leader, starting
I0926 02:24:52.880380       1 controller.go:811] Starting provisioner controller nfs.csi.k8s.io_k3s-hq-master1_6603e3a6-a878-417e-819d-875e2dc1b01a!
I0926 02:24:52.880470       1 clone_controller.go:66] Starting CloningProtection controller
I0926 02:24:52.880647       1 volume_store.go:97] Starting save volume queue
I0926 02:24:52.880738       1 clone_controller.go:82] Started CloningProtection controller
I0926 02:24:52.981758       1 controller.go:860] Started provisioner controller nfs.csi.k8s.io_k3s-hq-master1_6603e3a6-a878-417e-819d-875e2dc1b01a!
I0926 02:27:16.460472       1 controller.go:1359] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": started
I0926 02:27:16.460780       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-test", UID:"0af9008e-e778-4896-a00b-baea28041232", APIVersion:"v1", ResourceVersion:"1980825", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/pvc-test"
I0926 02:27:17.859174       1 controller.go:923] successfully created PV pvc-0af9008e-e778-4896-a00b-baea28041232 for PVC pvc-test and csi volume name 192.168.7.107#k8s-nfs-sgnnas03#pvc-0af9008e-e778-4896-a00b-baea28041232##
I0926 02:27:17.859250       1 controller.go:1442] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": volume "pvc-0af9008e-e778-4896-a00b-baea28041232" provisioned
I0926 02:27:17.859270       1 controller.go:1455] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": succeeded
I0926 02:27:17.866664       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-test", UID:"0af9008e-e778-4896-a00b-baea28041232", APIVersion:"v1", ResourceVersion:"1980825", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-0af9008e-e778-4896-a00b-baea28041232
I0926 02:34:37.196452       1 controller.go:1502] delete "pvc-0af9008e-e778-4896-a00b-baea28041232": started
I0926 02:34:37.196521       1 controller.go:1279] volume pvc-0af9008e-e778-4896-a00b-baea28041232 does not need any deletion secrets
I0926 02:34:37.306536       1 controller.go:1517] delete "pvc-0af9008e-e778-4896-a00b-baea28041232": volume deleted
I0926 02:34:37.318770       1 controller.go:1562] delete "pvc-0af9008e-e778-4896-a00b-baea28041232": persistentvolume deleted succeeded
I0926 02:35:16.719867       1 controller.go:1359] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": started
I0926 02:35:16.720125       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-test", UID:"25747ee0-1f9a-4ce8-b3d6-1dd5f5284393", APIVersion:"v1", ResourceVersion:"1981731", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/pvc-test"
I0926 02:35:16.806455       1 controller.go:923] successfully created PV pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393 for PVC pvc-test and csi volume name 192.168.7.107#k8s-nfs-sgnnas03#pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393##
I0926 02:35:16.806520       1 controller.go:1442] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": volume "pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393" provisioned
I0926 02:35:16.806541       1 controller.go:1455] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": succeeded
I0926 02:35:16.813133       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-test", UID:"25747ee0-1f9a-4ce8-b3d6-1dd5f5284393", APIVersion:"v1", ResourceVersion:"1981731", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393
I0926 02:36:24.270225       1 controller.go:1502] delete "pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393": started
I0926 02:36:24.270295       1 controller.go:1279] volume pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393 does not need any deletion secrets
I0926 02:36:24.382536       1 controller.go:1517] delete "pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393": volume deleted
I0926 02:36:24.392834       1 controller.go:1562] delete "pvc-25747ee0-1f9a-4ce8-b3d6-1dd5f5284393": persistentvolume deleted succeeded
I0926 02:38:44.339074       1 controller.go:1359] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": started
I0926 02:38:44.339304       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-test", UID:"7665caae-aec4-4225-be4c-e8afde29a3b9", APIVersion:"v1", ResourceVersion:"1982148", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/pvc-test"
I0926 02:38:44.419087       1 controller.go:923] successfully created PV pvc-7665caae-aec4-4225-be4c-e8afde29a3b9 for PVC pvc-test and csi volume name 192.168.7.107#k8s-nfs-sgnnas03#pvc-7665caae-aec4-4225-be4c-e8afde29a3b9##
I0926 02:38:44.419174       1 controller.go:1442] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": volume "pvc-7665caae-aec4-4225-be4c-e8afde29a3b9" provisioned
I0926 02:38:44.419207       1 controller.go:1455] provision "default/pvc-test" class "storageclass-csi-nfs-sgnnas03": succeeded
I0926 02:38:44.427127       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"pvc-test", UID:"7665caae-aec4-4225-be4c-e8afde29a3b9", APIVersion:"v1", ResourceVersion:"1982148", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-7665caae-aec4-4225-be4c-e8afde29a3b9

what warning? you mean this @jackchuong ?

0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports..

Yes, I mean that, my internal k8s cluster(single node) is set up by k3s, is it related ?

Yes, I mean that, my internal k8s cluster(single node) is set up by k3s, is it related ?

@jackchuong yes, one controller pod should run on one node, since you have two replicas, it requires two nodes at least.

@andyzhangx so I just ignore it or change helm chart values ?

controller:
  runOnMaster: false
  runOnControlPlane: true
  nodeSelector:
    - node-role.kubernetes.io/control-plane: true

kubectl get nodes --show-labels
NAME             STATUS   ROLES                  AGE   VERSION        LABELS
k3s-hq-master1   Ready    control-plane,master   18d   v1.27.5+k3s1   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=k3s,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k3s-hq-master1,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=true,node-role.kubernetes.io/master=true,node.kubernetes.io/instance-type=k3s

just ignore since controller is running now.