hpe csi driver nfs mount errors
Closed this issue · 5 comments
OCP 4.12
hpe csi driver 2.4.0
We're trying to deploy IBM TNCP (proviso) and only some pods of the deployment are failing with:
Warning FailedMount 80m (x2 over 88m) kubelet Unable to attach or mount volumes: unmounted volumes=[logs pack-content], unattached volumes=[logs pack-content kube-api-access-pzsdp keystore-security sessions-security work-pack]: timed out waiting for the condition
Warning FailedMount 31m (x6 over 62m) kubelet Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[kube-api-access-pzsdp keystore-security sessions-security work-pack logs pack-content]: timed out waiting for the condition
Warning FailedMount 10m (x4 over 55m) kubelet Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[logs pack-content kube-api-access-pzsdp keystore-security sessions-security work-pack]: timed out waiting for the condition
Warning FailedMount 6m31s kubelet Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[pack-content kube-api-access-pzsdp keystore-security sessions-security work-pack logs]: timed out waiting for the condition
Warning FailedMount 21s (x60 over 88m) kubelet (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[pack-content], unattached volumes=[sessions-security work-pack logs pack-content kube-api-access-pzsdp keystore-security]: timed out waiting for the condition
In the related hpe-csi-driver pod we see this:
time="2024-03-06T10:53:33Z" level=error msg="GRPC error: rpc error: code = Internal desc = Error mounting nfs share 172.30.57.231:/export at /var/lib/kubelet/pods/a22c9293-363a-40ba-9f05-fdc975e776f9/volumes/kubernetes.io~csi/pvc-259d76de-990e-43b7-b8de-4c28d87580c7/mount, err error command mount with pid: 2393 killed as timeout of 60 seconds reached" file="utils.go:73"
time="2024-03-06T10:54:19Z" level=error msg="\n Error in GetSecondaryBackends unexpected end of JSON input" file="volume.go:87"
time="2024-03-06T10:54:19Z" level=error msg="\n Passed details " file="volume.go:88"
time="2024-03-06T10:55:50Z" level=error msg="command mount with pid: 2424 killed as timeout of 60 seconds reached" file="cmd.go:60"
time="2024-03-06T10:55:50Z" level=error msg="GRPC error: rpc error: code = Internal desc = Error mounting nfs share 172.30.25.60:/export at /var/lib/kubelet/pods/29023752-be2c-499d-b92d-72373b423188/volumes/kubernetes.io~csi/pvc-69fe50cd-2f74-44b4-bb13-dc8c51add505/mount, err error command mount with pid: 2424 killed as timeout of 60 seconds reached" file="utils.go:73"
time="2024-03-06T10:56:35Z" level=error msg="command mount with pid: 2429 killed as timeout of 60 seconds reached" file="cmd.go:60"
Please help us to debug this and/or point the relevant issue.
# oc get sc -o yaml hpe-nfs
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
argocd.argoproj.io/sync-wave: "4"
kubectl.kubernetes.io/last-applied-configuration: |
{"allowVolumeExpansion":true,"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"argocd.argoproj.io/sync-wave":"4"},"labels":{"app.kubernetes.io/instance":"bru11-nprod-be-01-infra"},"name":"hpe-nfs"},"parameters":{"accessProtocol":"fc","allowMutations":"hostSeesVLUN","csi.storage.k8s.io/controller-expand-secret-name":"hpe-backend","csi.storage.k8s.io/controller-expand-secret-namespace":"hpe-storage","csi.storage.k8s.io/controller-publish-secret-name":"hpe-backend","csi.storage.k8s.io/controller-publish-secret-namespace":"hpe-storage","csi.storage.k8s.io/fstype":"xfs","csi.storage.k8s.io/node-publish-secret-name":"hpe-backend","csi.storage.k8s.io/node-publish-secret-namespace":"hpe-storage","csi.storage.k8s.io/node-stage-secret-name":"hpe-backend","csi.storage.k8s.io/node-stage-secret-namespace":"hpe-storage","csi.storage.k8s.io/provisioner-secret-name":"hpe-backend","csi.storage.k8s.io/provisioner-secret-namespace":"hpe-storage","description":"Volume created by the HPE CSI Driver for Kubernetes","fsMode":"0777","hostSeesVLUN":"true","nfsResources":"true"},"provisioner":"csi.hpe.com","reclaimPolicy":"Delete","volumeBindingMode":"Immediate"}
creationTimestamp: "2023-11-23T17:31:40Z"
labels:
app.kubernetes.io/instance: bru11-nprod-be-01-infra
name: hpe-nfs
resourceVersion: "62518251"
uid: afc44a3f-6ba2-4ac7-864a-9106dbd01173
parameters:
accessProtocol: fc
allowMutations: hostSeesVLUN
csi.storage.k8s.io/controller-expand-secret-name: hpe-backend
csi.storage.k8s.io/controller-expand-secret-namespace: hpe-storage
csi.storage.k8s.io/controller-publish-secret-name: hpe-backend
csi.storage.k8s.io/controller-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/fstype: xfs
csi.storage.k8s.io/node-publish-secret-name: hpe-backend
csi.storage.k8s.io/node-publish-secret-namespace: hpe-storage
csi.storage.k8s.io/node-stage-secret-name: hpe-backend
csi.storage.k8s.io/node-stage-secret-namespace: hpe-storage
csi.storage.k8s.io/provisioner-secret-name: hpe-backend
csi.storage.k8s.io/provisioner-secret-namespace: hpe-storage
description: Volume created by the HPE CSI Driver for Kubernetes
fsMode: "0777"
hostSeesVLUN: "true"
nfsResources: "true"
provisioner: csi.hpe.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
thanks
We've noticed the automatic upgrade was enabled for the operator and it's in broken state.
Name: certified-operators
Namespace: openshift-marketplace
Identifier: hpe-csi-operator.v2.4.1
Path: registry.connect.redhat.com/hpestorage/csi-driver-operator-bundle@sha256:b5f87d6a9c7ec3a4d53204e86d0fa57d10d4aba2eeb1882f0a1b1caa19c7d9fd
Properties: {"properties":[{"type":"olm.gvk","value":{"group":"storage.hpe.com","kind":"HPECSIDriver","version":"v1"}},{"type":"olm.package","value":{"packageName":"hpe-csi-operator","version":"2.4.1"}}]}
Replaces: hpe-csi-operator.v2.4.0
Catalog Sources:
Conditions:
Last Transition Time: 2024-03-04T17:51:58Z
Last Update Time: 2024-03-04T17:51:58Z
Message: error validating existing CRs against new CRD's schema for "hpecsidrivers.storage.hpe.com": error validating custom resource against new schema for HPECSIDriver hpe-csi-driver/csi-driver: [].spec.disable.alletraStorageMP: Required value
can you advise what to do to resolve? 2.4.0 is in replacing state and 2.4.1 is in Pending
only found this: https://github.com/hpe-storage/csi-driver/blob/master/release-notes/v2.4.1.md not in releases even
should we edit the old CRD to include the disable.alletraStorageMP ?
or follow this
https://scod.hpedev.io/partners/redhat_openshift/index.html#upgrading
The operator is currently broken. We're working with Red Hat to have it resolved.
The 2.4.0 release has been restored.
how do we get out of the broken state though?