kubernetes-csi/csi-driver-nfs

csi-snapshotter complains about missing CRD's quite noisily in the logs when externalSnapshotter Helm value is set to disabled

Closed this issue · 10 comments

I deployed observability to my cluster and noticed significant amout of log entries coming from csi-snapshotter:

E0724 04:37:31.925736       1 reflector.go:140] github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: volumesnapshotcontents.snapshot.storage.k8s.io is forbidden: User "system:serviceaccount:csinfs:csi-nfs-controller-sa" cannot list resource "volumesnapshotcontents" in API group "snapshot.storage.k8s.io" at the cluster scope
}

Approx 900 lines in 8 hours.

The Helm charts only modified values is:

  values:
    externalSnapshotter:
      enabled: false

Im not sure what the csi-snapshotter is doing if we have disable externalSnapshotter - should it still be there? Or should we install the CRD's anyway even if not the container to prevent the noisy log messages?

Environment:

  • CSI Driver version: Helm Chart v.4.8.0
  • Kubernetes version (use kubectl version): 1.29.2
  • OS (e.g. from /etc/os-release): Linux
  • Kernel (e.g. uname -a): Talos Linux v1.6.6
  • Install tools: Helm
  • Others:

@MysticalMount
what's the kubectl get clusterrole nfs-external-provisioner-role -o yaml, the system:serviceaccount:csinfs:csi-nfs-controller-sa should already have following permissions:

- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshotcontents"]
verbs: ["get", "list", "watch", "update", "patch"]

Hi Andy, I have uninstalled and re-installed the Helm chart a few times so perhaps thats only when the issue occurs, here is the output of the the clusterrole. I had heard upgrading Helm charts from a previous version doesnt always update the CRD's - so perhaps its that (did come from version v4.4.0 - v4.8.0)

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    meta.helm.sh/release-name: csinfs
    meta.helm.sh/release-namespace: csinfs
  creationTimestamp: "2024-07-25T14:50:48Z"
  labels:
    app.kubernetes.io/instance: csinfs
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: csi-driver-nfs
    app.kubernetes.io/version: v4.8.0
    helm.sh/chart: csi-driver-nfs-v4.8.0
    helm.toolkit.fluxcd.io/name: csinfs
    helm.toolkit.fluxcd.io/namespace: csinfs
  name: nfs-external-provisioner-role
  resourceVersion: "58106543"
  uid: b23ddafb-89f1-48fe-9a6b-b3b1ca631ad8
rules:
- apiGroups:
  - ""
  resources:
  - persistentvolumes
  verbs:
  - get
  - list
  - watch
  - create
  - delete
- apiGroups:
  - ""
  resources:
  - persistentvolumeclaims
  verbs:
  - get
  - list
  - watch
  - update
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - snapshot.storage.k8s.io
  resources:
  - volumesnapshotclasses
  - volumesnapshots
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - snapshot.storage.k8s.io
  resources:
  - volumesnapshotcontents
  verbs:
  - get
  - list
  - watch
  - update
  - patch
- apiGroups:
  - snapshot.storage.k8s.io
  resources:
  - volumesnapshotcontents/status
  verbs:
  - get
  - update
  - patch
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
- apiGroups:
  - storage.k8s.io
  resources:
  - csinodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - get
  - list
  - watch
  - create
  - update
  - patch
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get

can you check why user "system:serviceaccount:csinfs:csi-nfs-controller-sa" cannot list resource "volumesnapshotcontents" in API group "snapshot.storage.k8s.io" at the cluster scope? e.g. check service account csinfs:csi-nfs-controller-sa info, that's not related to CRD. and those logs appears in your csi-snapshotter sidecar in csi-nfs-controller, right?

Hello,

I am experiencing a similar issue as described here. I am using MicroK8s without any snapshot controllers installed, which rightly means that the snapshot-related CRDs (VolumeSnapshot, VolumeSnapshotClass, VolumeSnapshotContent) do not exist in my environment. Given this setup, the continuous logging of errors from the csi-snapshotter seems unnecessary as snapshot functionality is not applicable.

Is there a way to disable the csi-snapshotter container in scenarios where snapshot functionality is not being used or intended? Any guidance on this would be greatly appreciated.

Thank you!

try reinstall the nfs driver if you have upgraded the csi driver, similar to kubernetes-csi/external-snapshotter#975 (comment)

Hi,
Myself facing same issue as describe here

E0821 02:12:03.111339       1 reflector.go:150] github.com/kubernetes-csi/external-snapshotter/client/v8/informers/externalversions/factory.go:142: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)

I have the same issue here, with csi-driver-nfs installed through its Helm chart (without enabling externalSnapshotter)

Could it simply come from the fact that the Helm chart currently deploys the CRD only if externalSnapshotter is enabled? See the if statement here: https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/charts/latest/csi-driver-nfs/templates/crd-csi-snapshot.yaml#L1

Even if this CRD is also needed by the csi-snapshotter pod of the controller:
https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/charts/latest/csi-driver-nfs/templates/csi-nfs-controller.yaml#L78 ?

Should this be fixed in the Helm chart?

A workaround is to deploy the CRD manually:

kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/deploy/v4.8.0/crd-csi-snapshot.yaml

(this is for v4.8.0: adapt for the version you use)

However, it's not clean to mix the Helm chart and a custom deployment like this: Helm will complain if (later on) it tries to deploy this CRD (because of a fix for this issue, or a configuration change you did). Because helm will not find its labels on the CRDs, and refuses to overwrite an object that it did not handle so far. So use this workaround only if you know what you're doing

Any update here?

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

one fix is that adding a new flag, e.g. --set .controller.enableSnapshotter=false in case you totally don't need snapshot function.