kubernetes-sigs/gcp-compute-persistent-disk-csi-driver

Some volumeattachments.storage.k8s.io are not cleaned, PV cannot be removed

GregoireW opened this issue · 2 comments

Hello,

I got a GKE cluster (1.22.8-gke.200) with csidrivers version 0.11.6 ( image version v1.5.1-gke.0)
At around the same time I got this new csidrivers version, unused/detached volumeattachment started to pile up (Or I never noticed that before ... ):

$ k get volumeattachments.storage.k8s.io | grep my-volume
csi-01a199df9d14f127c79d5aab20c4a2663b0c25ca1f28727c67a0881bc5b4e4a6   pd.csi.storage.gke.io   my-volume   gke-my-nodepool-f0ef438e-8bsg   true       4d22h
csi-228d7aa699688c0eea08516bd646c5906473836917d02eb584123a29ace770a7   pd.csi.storage.gke.io   my-volume   gke-my-nodepool-71a8ed9e-nbkj   true       5d
csi-44459803df3899df85756f82dfc6d885f196b8795dc7b6170e9f53bbf2bab36f   pd.csi.storage.gke.io   my-volume   gke-my-nodepool-71a8ed9e-wbl8   true       3d
csi-6922e0a0843ee7b27f8c06dbb2ee60f0dfac8fe48cf6cf32b21287060a4c653e   pd.csi.storage.gke.io   my-volume   gke-my-nodepool-f0ef438e-7pk2   true       38h
...

If those attachments cannot be removed, then the PV cannot be removed either and stay in "terminating" state.

The only way I found to remove the PV was to remove the finalizers on the PV.

  finalizers:
  - external-attacher/pd-csi-storage-gke-io

If I describe the volumeattachments : (short version)

k describe volumeattachments.storage.k8s.io csi-01a199df9d14f127c79d5aab20c4a2663b0c25ca1f28727c67a0881bc5b4e4a6

Name:         csi-01a199df9d14f127c79d5aab20c4a2663b0c25ca1f28727c67a0881bc5b4e4a6
Namespace:    
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: projects/......
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Finalizers:
    external-attacher/pd-csi-storage-gke-io
  ...
Spec:
  ...
Status:
  Attached:  true
  Detach Error:
    Message:  rpc error: code = Unavailable desc = Request queued due to error condition on node
    Time:     2022-05-23T18:24:53Z
Events:       <none>

It is like the driver did not execute something on the node so this volumeattachments is kept indefinitely.

It may be related to my usecase where my node pool is setup to use preemptible node, so node can be terminated quite abruptly. And if the node is not there, the detach cannot be run on the node.

Thank you

This looks like #987, qv.

/close (duplicate)

oups... I did search VolumeAttachments not VolumeAttachment ...