kubernetes-retired/external-storage

[cephfs] Deletion pvc issues with ceph 14

xriser opened this issue · 8 comments

Using with ceph v14 provisioner doesn't delete volume from ceph and user secret as well
The error like follow:
I0321 18:04:38.854980 1 controller.go:1158] delete "pvc-1a61b1c9-9dcf-41a7-b8fc-183799545396": started E0321 18:04:39.936090 1 cephfs-provisioner.go:268] failed to delete share "tst-pvc" for "k8s.default.tst-pvc", err: exit status 1, output: Traceback (most recent call last): File "/usr/local/bin/cephfs_provisioner", line 364, in <module> main() File "/usr/local/bin/cephfs_provisioner", line 360, in main cephfs.delete_share(share, user) File "/usr/local/bin/cephfs_provisioner", line 319, in delete_share self._deauthorize(volume_path, user_id) File "/usr/local/bin/cephfs_provisioner", line 260, in _deauthorize pool_name = self.volume_client._get_ancestor_xattr(path, "ceph.dir.layout.pool") File "/lib/python2.7/site-packages/ceph_volume_client.py", line 756, in _get_ancestor_xattr result = self.fs.getxattr(path, attr) File "cephfs.pyx", line 954, in cephfs.LibCephFS.getxattr (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.1/rpm/el7/BUILD/ceph-13.2.1/build/src/pybind/cephfs/pyrex/cephfs.c:10083) cephfs.ObjectNotFound: [Errno 2] error in getxattr E0321 18:04:39.936184 1 controller.go:1181] delete "pvc-1a61b1c9-9dcf-41a7-b8fc-183799545396": volume deletion failed: exit status 1 W0321 18:04:39.936357 1 controller.go:787] Retrying syncing volume "pvc-1a61b1c9-9dcf-41a7-b8fc-183799545396" because failures 0 < threshold 15 E0321 18:04:39.936437 1 controller.go:802] error syncing volume "pvc-1a61b1c9-9dcf-41a7-b8fc-183799545396": exit status 1 I0321 18:04:39.936499 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-1a61b1c9-9dcf-41a7-b8fc-183799545396", UID:"13a518c8-4512-4bfe-a643-2fac09dd06b5", APIVersion:"v1", ResourceVersion:"492116", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' exit status 1

Tried mimic and nautilus, rebuilt provisioner docker image to the latest code, tried different images, checked fattr as @zhoubofsy mentioned here #860 , nothing helped
Also checked secrets and so on.

The issue, as follows: pvc provisions well, client secret creates in cephfs namespace as well.
Then when I deleted pvc - it deleted, pv still remaining with Released status, the data from pvc actually remaining in the ceph storage, the client secret does not delete, and the error in the provisioner pod.
Then I can delete pv and user secret by hands - it is ok. But the data in the ceph storage still remaining.
Then if I create pvc with the same name - it will be with the previously stored data.
E0322 13:16:42.082854 1 cephfs-provisioner.go:272] failed to delete share "data-elasticsearch-elasticsearch-master-1" for "k8s.efk.data-elasticsearch-elasticsearch-master-1", err: exit status 1, output: Traceback (most recent call last): File "/usr/local/bin/cephfs_provisioner", line 364, in <module> main() File "/usr/local/bin/cephfs_provisioner", line 360, in main cephfs.delete_share(share, user) File "/usr/local/bin/cephfs_provisioner", line 319, in delete_share self._deauthorize(volume_path, user_id) File "/usr/local/bin/cephfs_provisioner", line 260, in _deauthorize pool_name = self.volume_client._get_ancestor_xattr(path, "ceph.dir.layout.pool") File "/lib/python2.7/site-packages/ceph_volume_client.py", line 800, in _get_ancestor_xattr result = self.fs.getxattr(path, attr).decode() File "cephfs.pyx", line 1099, in cephfs.LibCephFS.getxattr (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/build/src/pybind/cephfs/pyrex/cephfs.c:11926) cephfs.ObjectNotFound: error in getxattr: No such file or directory [Errno 2] E0322 13:16:42.083877 1 controller.go:1120] delete "pvc-be69ffe3-6925-4408-a160-701ef72e44cc": volume deletion failed: exit status 1 W0322 13:16:42.083999 1 controller.go:726] Retrying syncing volume "pvc-be69ffe3-6925-4408-a160-701ef72e44cc" because failures 0 < threshold 15 E0322 13:16:42.084050 1 controller.go:741] error syncing volume "pvc-be69ffe3-6925-4408-a160-701ef72e44cc": exit status 1 I0322 13:16:42.084105 1 controller.go:1097] delete "pvc-89f664c4-82d5-4d7d-b189-e2cf1d084908": started I0322 13:16:42.084316 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-be69ffe3-6925-4408-a160-701ef72e44cc", UID:"f2aa4d67-4fbd-43ee-9632-608fb13c1f5d", APIVersion:"v1", ResourceVersion:"440534", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' exit status 1 E0322 13:16:42.127103 1 cephfs-provisioner.go:272] failed to delete share "data-elasticsearch-elasticsearch-data-0" for "k8s.efk.data-elasticsearch-elasticsearch-data-0", err: exit status 1, output: Traceback (most recent call last): File "/usr/local/bin/cephfs_provisioner", line 364, in <module> main() File "/usr/local/bin/cephfs_provisioner", line 360, in main cephfs.delete_share(share, user) File "/usr/local/bin/cephfs_provisioner", line 319, in delete_share self._deauthorize(volume_path, user_id) File "/usr/local/bin/cephfs_provisioner", line 260, in _deauthorize pool_name = self.volume_client._get_ancestor_xattr(path, "ceph.dir.layout.pool") File "/lib/python2.7/site-packages/ceph_volume_client.py", line 800, in _get_ancestor_xattr result = self.fs.getxattr(path, attr).decode() File "cephfs.pyx", line 1099, in cephfs.LibCephFS.getxattr (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.8/rpm/el7/BUILD/ceph-14.2.8/build/src/pybind/cephfs/pyrex/cephfs.c:11926) cephfs.ObjectNotFound: error in getxattr: No such file or directory [Errno 2] E0322 13:16:42.127316 1 controller.go:1120] delete "pvc-d0c9a302-68a0-42b9-a1f3-23b3ccc931d8": volume deletion failed: exit status 1 W0322 13:16:42.127666 1 controller.go:726] Retrying syncing volume "pvc-d0c9a302-68a0-42b9-a1f3-23b3ccc931d8" because failures 0 < threshold 15 E0322 13:16:42.128670 1 controller.go:741] error syncing volume "pvc-d0c9a302-68a0-42b9-a1f3-23b3ccc931d8": exit status 1

I encountered the same problem

@AlawnWong this project is outdated and no longer supported.
I have switched to the Ceph-csi https://github.com/ceph/ceph-csi
It works perfectly.

@xriser
which ceph version do you used?
this error seems happened in ceph luminous, but no in ceph nautilus。

@Ranler, as I said I have tried mimic and nautilus.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

I have found the solutions: no CEPH_VOLUME_GROUP enviroment when delete pv
the fix is at : fc016bc

Thanks for reporting the issue!

This repo is no longer being maintained and we are in the process of archiving this repo. Please see kubernetes/org#1563 for more details.

If your issue relates to nfs provisioners, please create a new issue in https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner or https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner.

Going to close this issue in order to archive this repo. Apologies for the churn and thanks for your patience! 🙏