list directory got error when s3 file object is deleted
shuaiyy opened this issue · 4 comments
shuaiyy commented
ls ./
ls: cannot access 'part-00000-37151a50-6143-4f01-a42b-b3fa555d6ab0-c000.csv': No such file or directory
ls: cannot access 'part-00000-aa2e7279-1b77-4b60-84fa-fbd34e6d9764-c000.csv': No such file or directory
ls: cannot access 'part-00000-ccdf038f-034a-4c9f-89e3-f9071624fd4f-c000.csv': No such file or directory
ls: cannot access 'part-00000-d0338d69-6b69-4cf3-a93d-7dceb47e36f1-c000.csv': No such file or directory
part-00000-0f1be43a-423e-48f3-9976-a6e2987e50d6-c000.csv
part-00000-37151a50-6143-4f01-a42b-b3fa555d6ab0-c000.csv
part-00000-aa2e7279-1b77-4b60-84fa-fbd34e6d9764-c000.csv
part-00000-ccdf038f-034a-4c9f-89e3-f9071624fd4f-c000.csv
part-00000-d0338d69-6b69-4cf3-a93d-7dceb47e36f1-c000.csv
- some files in s3 are deleted a few days ago, before I do a
ls
。a secondls
will be okay. - geesefs version 0.34.4
- the mount info:
[root@ip-10-171-152-236 /]# cat /proc/mounts | grep geese
data-sg03-data-mining-hive: /var/lib/kubelet/pods/e323b95d-61d6-4f31-b1b5-ed8bb94af311/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount fuse.geesefs rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
data-sg03-data-mining-hive: /var/lib/kubelet/pods/c7c9eee4-a694-453d-a3fb-7529f2db1839/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount fuse.geesefs rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
[root@ip-10-171-152-236 /]# ps auxww | grep geese
root 21034 0.1 0.0 0 0 ? Zs Feb14 68:43 [geesefs] <defunct>
root 22044 0.0 0.0 0 0 ? Zs Feb24 1:54 [geesefs] <defunct>
root 22343 0.0 0.0 0 0 ? Zs Feb24 1:55 [geesefs] <defunct>
root 24430 0.0 0.0 0 0 ? Zs Feb24 2:04 [geesefs] <defunct>
root 25217 0.0 0.0 0 0 ? Zs Feb24 2:00 [geesefs] <defunct>
root 41855 0.0 0.0 0 0 ? Zs Feb21 0:58 [geesefs] <defunct>
root 42368 0.0 0.0 0 0 ? Zs Feb21 0:58 [geesefs] <defunct>
root 43095 0.0 0.0 0 0 ? Zs Feb21 0:58 [geesefs] <defunct>
root 44776 0.0 0.0 0 0 ? Zs Feb21 0:58 [geesefs] <defunct>
root 46148 0.0 0.0 0 0 ? Zs Feb27 1:50 [geesefs] <defunct>
root 50968 0.0 0.0 0 0 ? Zs Feb24 0:37 [geesefs] <defunct>
root 70150 0.0 0.0 0 0 ? Zs Mar10 3:06 [geesefs] <defunct>
root 70799 0.0 0.0 0 0 ? Zs Mar10 5:32 [geesefs] <defunct>
root 72906 0.7 0.5 9078920 4445864 ? Ssl Mar13 41:02 /usr/bin/geesefs --endpoint https://s3.ap-southeast-1.amazonaws.com -o allow_other --log-file /dev/stderr --no-checksum --memory-limit 4000 --max-flushers 32 --max-parallel-parts 32 --part-sizes 25 --dir-mode 0777 --file-mode 0666 data-sg03-data-mining-hive: /var/lib/kubelet/pods/e323b95d-61d6-4f31-b1b5-ed8bb94af311/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount
root 80622 0.0 0.0 0 0 ? Zs Feb21 1:00 [geesefs] <defunct>
root 81137 0.0 0.0 0 0 ? Zs Feb21 0:59 [geesefs] <defunct>
root 81897 0.0 0.0 0 0 ? Zs Feb21 1:00 [geesefs] <defunct>
root 82696 0.0 0.0 0 0 ? Zs Feb21 0:59 [geesefs] <defunct>
root 86364 0.1 0.0 729152 34656 ? Ssl 08:02 0:00 /usr/bin/geesefs --endpoint https://s3.ap-southeast-1.amazonaws.com -o allow_other --log-file /dev/stderr --no-checksum --memory-limit 4000 --max-flushers 32 --max-parallel-parts 32 --part-sizes 25 --dir-mode 0777 --file-mode 0666 data-sg03-data-mining-hive: /var/lib/kubelet/pods/c7c9eee4-a694-453d-a3fb-7529f2db1839/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount
root 90348 0.0 0.0 119428 996 ? S+ 08:06 0:00 grep --color=auto geese
root 90531 0.0 0.0 0 0 ? Zs Feb24 0:58 [geesefs] <defunct>
root 90837 0.0 0.0 0 0 ? Zs Feb24 0:58 [geesefs] <defunct>
vitalif commented
Hi, that seems correct, at the time of the first listing kernel still has said inodes cached and it only removes them when GeeseFS rechecks their existence on the server and replies with ENOENT to the kernel.
shuaiyy commented
Thanks for your reply.
Is there a method to avoid this error, Many of our jobs failed because os.listdir()
got deleted object file name.
vitalif commented
I think we can leverage FUSE_NOTIFY_DELETE / FUSE_NOTIFY_INVAL_ENTRY to prevent such errors. It requires patching go-fuse library though, but I'll take a look into it
vitalif commented
Check the newest release 0.35.0, it has kernel notifications.