hpe-storage/csi-driver

hpe csi driver nfs mount errors

Closed this issue · 5 comments

OCP 4.12
hpe csi driver 2.4.0

We're trying to deploy IBM TNCP (proviso) and only some pods of the deployment are failing with:

  Warning  FailedMount  80m (x2 over 88m)   kubelet            Unable to attach or mount volumes: unmounted volumes=[logs pack-content], unattached volumes=[logs pack-content kube-api-access-pzsdp keystore-security sessions-security work-pack]: timed out waiting for the condition
  Warning  FailedMount  31m (x6 over 62m)   kubelet            Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[kube-api-access-pzsdp keystore-security sessions-security work-pack logs pack-content]: timed out waiting for the condition
  Warning  FailedMount  10m (x4 over 55m)   kubelet            Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[logs pack-content kube-api-access-pzsdp keystore-security sessions-security work-pack]: timed out waiting for the condition
  Warning  FailedMount  6m31s               kubelet            Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[pack-content kube-api-access-pzsdp keystore-security sessions-security work-pack logs]: timed out waiting for the condition
  Warning  FailedMount  21s (x60 over 88m)  kubelet            (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[sessions-security work-pack logs pack-content kube-api-access-pzsdp keystore-security]: timed out waiting for the condition

In the related hpe-csi-driver pod we see this:

time="2024-03-06T10:53:33Z" level=error msg="GRPC error: rpc error: code = Internal desc = Error mounting nfs share 172.30.57.231:/export at /var/lib/kubelet/pods/a22c9293-363a-40ba-9f05-fdc975e776f9/volumes/kubernetes.io~csi/pvc-259d76de-990e-43b7-b8de-4c28d87580c7/mount, err error command mount with pid: 2393 killed as timeout of 60 seconds reached" file="utils.go:73"
time="2024-03-06T10:54:19Z" level=error msg="\n Error in GetSecondaryBackends unexpected end of JSON input" file="volume.go:87"
time="2024-03-06T10:54:19Z" level=error msg="\n Passed details " file="volume.go:88"
time="2024-03-06T10:55:50Z" level=error msg="command mount with pid: 2424 killed as timeout of 60 seconds reached" file="cmd.go:60"
time="2024-03-06T10:55:50Z" level=error msg="GRPC error: rpc error: code = Internal desc = Error mounting nfs share 172.30.25.60:/export at /var/lib/kubelet/pods/29023752-be2c-499d-b92d-72373b423188/volumes/kubernetes.io~csi/pvc-69fe50cd-2f74-44b4-bb13-dc8c51add505/mount, err error command mount with pid: 2424 killed as timeout of 60 seconds reached" file="utils.go:73"
time="2024-03-06T10:56:35Z" level=error msg="command mount with pid: 2429 killed as timeout of 60 seconds reached" file="cmd.go:60"

Please help us to debug this and/or point the relevant issue.

# oc get sc -o yaml hpe-nfs
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "4"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"argocd.argoproj.io/sync-wave":"4"},"labels":{"app.kubernetes.io/instance":"bru11-nprod-be-01-infra"},"name":"hpe-nfs"},"parameters":{"accessProtocol":"fc","allowMutations":"hostSeesVLUN","csi.storage.k8s.io/controller-expand-secret-name":"hpe-backend","csi.storage.k8s.io/controller-expand-secret-namespace":"hpe-storage","csi.storage.k8s.io/controller-publish-secret-name":"hpe-backend","csi.storage.k8s.io/controller-publish-secret-namespace":"hpe-storage","csi.storage.k8s.io/fstype":"xfs","csi.storage.k8s.io/node-publish-secret-name":"hpe-backend","csi.storage.k8s.io/node-publish-secret-namespace":"hpe-storage","csi.storage.k8s.io/node-stage-secret-name":"hpe-backend","csi.storage.k8s.io/node-stage-secret-namespace":"hpe-storage","csi.storage.k8s.io/provisioner-secret-name":"hpe-backend","csi.storage.k8s.io/provisioner-secret-namespace":"hpe-storage","description":"Volume created by the HPE CSI Driver for Kubernetes","fsMode":"0777","hostSeesVLUN":"true","nfsResources":"true"},"provisioner":"csi.hpe.com","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
  creationTimestamp: "2023-11-23T17:31:40Z"
  labels:
    app.kubernetes.io/instance: bru11-nprod-be-01-infra
  name: hpe-nfs
  resourceVersion: "62518251"
  uid: afc44a3f-6ba2-4ac7-864a-9106dbd01173
parameters:
  accessProtocol: fc
  allowMutations: hostSeesVLUN
  csi.storage.k8s.io/controller-expand-secret-name: hpe-backend
  csi.storage.k8s.io/controller-expand-secret-namespace: hpe-storage
  csi.storage.k8s.io/controller-publish-secret-name: hpe-backend
  csi.storage.k8s.io/controller-publish-secret-namespace: hpe-storage
  csi.storage.k8s.io/fstype: xfs
  csi.storage.k8s.io/node-publish-secret-name: hpe-backend
  csi.storage.k8s.io/node-publish-secret-namespace: hpe-storage
  csi.storage.k8s.io/node-stage-secret-name: hpe-backend
  csi.storage.k8s.io/node-stage-secret-namespace: hpe-storage
  csi.storage.k8s.io/provisioner-secret-name: hpe-backend
  csi.storage.k8s.io/provisioner-secret-namespace: hpe-storage
  description: Volume created by the HPE CSI Driver for Kubernetes
  fsMode: "0777"
  hostSeesVLUN: "true"
  nfsResources: "true"
provisioner: csi.hpe.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

thanks

We've noticed the automatic upgrade was enabled for the operator and it's in broken state.

      Name:       certified-operators
      Namespace:  openshift-marketplace
    Identifier:   hpe-csi-operator.v2.4.1
    Path:         registry.connect.redhat.com/hpestorage/csi-driver-operator-bundle@sha256:b5f87d6a9c7ec3a4d53204e86d0fa57d10d4aba2eeb1882f0a1b1caa19c7d9fd
    Properties:   {"properties":[{"type":"olm.gvk","value":{"group":"storage.hpe.com","kind":"HPECSIDriver","version":"v1"}},{"type":"olm.package","value":{"packageName":"hpe-csi-operator","version":"2.4.1"}}]}
    Replaces:     hpe-csi-operator.v2.4.0
  Catalog Sources:
  Conditions:
    Last Transition Time:  2024-03-04T17:51:58Z
    Last Update Time:      2024-03-04T17:51:58Z
    Message:               error validating existing CRs against new CRD's schema for "hpecsidrivers.storage.hpe.com": error validating custom resource against new schema for HPECSIDriver hpe-csi-driver/csi-driver: [].spec.disable.alletraStorageMP: Required value

can you advise what to do to resolve? 2.4.0 is in replacing state and 2.4.1 is in Pending

only found this: https://github.com/hpe-storage/csi-driver/blob/master/release-notes/v2.4.1.md not in releases even

should we edit the old CRD to include the disable.alletraStorageMP ?

or follow this
https://scod.hpedev.io/partners/redhat_openshift/index.html#upgrading

The operator is currently broken. We're working with Red Hat to have it resolved.

The 2.4.0 release has been restored.

how do we get out of the broken state though?