Unable to mount volumes for pod XX. list of unattached/unmounted volumes=YYYY
mattshma opened this issue · 4 comments
查看原因,解释为:
When the PVC protection alpha feature is enabled, if a user deletes a PVC in active use by a pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any pods.
查看 k8s log,如下:
Mar 22 14:34:52 kubelet[2715]: E0322 14:34:52.534778 2715 desired_state_of_world_populator.go:273] Error processing volume "jupyter" for pod "jupyter-2zvlc(a726c9c6-2d9a-11e8-b5f0-005056b76c14)": error processing PVC "k8s"/"jupyter-
"k8s"/"jupyter": PVC k8s/jupyter has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("")
Mar 22 14:34:52 kubelet[2715]: E0322 14:34:52.734273 2715 desired_state_of_world_populator.go:273] Error processing volume "jupyter" for pod "jupyter_2zvlc(a726c9c6-2d9a-11e8-b5f0-005056b76c14)": error processing PVC "k8s"/"jupyter": PVC k8s/jupyter has non-bound phase ("Pending") or empty pvc.Spec.VolumeName ("")
在 K8S Master 查看信息:
# kubectl -n nAMESPACE describe pod POD_NAME
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned jupyter to xxxx
Warning FailedAttachVolume 2m attachdetach-controller Multi-Attach error for volume "jupyter" Volume is already exclusively attached to one node and can't be attached to another
Warning FailedMount 2s kubelet,xxx Unable to mount volumes for pod "jupyter(df1d9172-2e48-11e8-ba93-005056b75104)": timeout expired waiting for volumes to attach/mount for pod "k8s"/"jupyter".
怀疑是 ceph 的问题,先查看 ceph 的lock:rbd lock list kube/jupyter
,无输出,说明没lock。无头绪,再次查看 kubelet 的相关 log: journalctl -xe -u kubelet
:
Mar 23 11:19:48 xxx kubelet[18992]: I0323 11:19:48.892975 18992 rbd_util.go:273] rbd image kube/jupyter still being used
Mar 23 11:19:48 xxx kubelet[18992]: E0323 11:19:48.893128 18992 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/rbd/[xxxxx]:jupyter\"" failed. No retries permitted until 2018-03-23 11:20:52.893058902 +0800 CST m=+50438.570203365 (durationBeforeRetry 1m4s). Error: "MountVolume.WaitForAttach failed for volume \"jupyter\" (UniqueName: \"kubernetes.io/rbd/[xxx]:jupyter\") pod \"jupyter-7bd54668c7-5496r\" (UID: \"df1d9172-2e48-11e8-ba93-005056b75104\") : rbd image kube/jupyter is still being used. rbd output: Watchers:\n\twatcher=xxxxx:0/3785365512 client.20052960 cookie=18446462598732840961\n"
找到关键证据!查看 rados watcher:
$ rbd info kube/jupyter
rbd image 'jupyter':
size 30720 MB in 7680 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.1217303d4abff7
format: 2
features: layering
flags:
// rbd_header 值为 rbd_data 后的数字
$ rados listwatchers -p kube rbd_header.1217303d4abff7
watcher=10.10.18.30:0/3785365512 client.20052960 cookie=18446462598732840961
$ ceph osd blacklist ls
listed 0 entries
$ ceph osd blacklist add 10.10.18.30:0/3785365512
blacklisting 10.10.18.30:0/3785365512 until 2018-03-23 14:02:15.706177 (3600 sec)
$ ceph osd blacklist ls
listed 1 entries
10.10.18.30:0/3785365512 2018-03-23 14:02:15.706177
$ rados listwatchers -p kube rbd_header.1217303d4abff7
$ ceph osd blacklist rm 10.10.18.30:0/3785365512
un-blacklisting 10.10.18.30:0/3785365512
$ ceph osd blacklist ls
listed 0 entries
$ rados listwatchers -p kube rbd_header.1217303d4abff7
执行完上面操作,再次启动容器,成功!
UPDATE:故障可能原因二
按以上操作均无效。偶然看到之前在其他机器上有 mount 过该 rbd image 的操作,若能 umount 掉,则umount。否则可以尝试 unmap 该 rbd image,我在 unmap 时出错,发现之前 mount 的命令已经死掉了,无奈重启机器后解决问题。
UPDATE:故障可能原因三
对应的宿主机上该 image 没 map。执行 sudo rbd map IMAGE -p POOLNAME
后,可使用。
报错:
timeout expired waiting for volumes to attach/mount for pod xxxx. list of unattached/unmounted volumes=[xxxxx]
rbd showmapped
查看对应的目录,然后执行 fcsk /dev/rbdN
,磁盘报错,通过sudo e2fsck -y /dev/rbdN
进行修复。
若 e2fcsk 修复时间太久,原因是该目录下文件太大了,可以先 mount 将大文件备份下来,然后再执行 e2fcsk 修复:
mount /dev/rbdN /mnt
mv /mnt/BIG_FILE /bak_dir