yandex-cloud/geesefs

list directory got error when s3 file object is deleted

shuaiyy opened this issue · 4 comments

 ls ./
ls: cannot access 'part-00000-37151a50-6143-4f01-a42b-b3fa555d6ab0-c000.csv': No such file or directory
ls: cannot access 'part-00000-aa2e7279-1b77-4b60-84fa-fbd34e6d9764-c000.csv': No such file or directory
ls: cannot access 'part-00000-ccdf038f-034a-4c9f-89e3-f9071624fd4f-c000.csv': No such file or directory
ls: cannot access 'part-00000-d0338d69-6b69-4cf3-a93d-7dceb47e36f1-c000.csv': No such file or directory
part-00000-0f1be43a-423e-48f3-9976-a6e2987e50d6-c000.csv
part-00000-37151a50-6143-4f01-a42b-b3fa555d6ab0-c000.csv
part-00000-aa2e7279-1b77-4b60-84fa-fbd34e6d9764-c000.csv
part-00000-ccdf038f-034a-4c9f-89e3-f9071624fd4f-c000.csv
part-00000-d0338d69-6b69-4cf3-a93d-7dceb47e36f1-c000.csv
  1. some files in s3 are deleted a few days ago, before I do a ls。a second ls will be okay.
  2. geesefs version 0.34.4
  3. the mount info:
[root@ip-10-171-152-236 /]# cat /proc/mounts | grep geese
data-sg03-data-mining-hive: /var/lib/kubelet/pods/e323b95d-61d6-4f31-b1b5-ed8bb94af311/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount fuse.geesefs rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
data-sg03-data-mining-hive: /var/lib/kubelet/pods/c7c9eee4-a694-453d-a3fb-7529f2db1839/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount fuse.geesefs rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
[root@ip-10-171-152-236 /]# ps auxww | grep geese
root     21034  0.1  0.0      0     0 ?        Zs   Feb14  68:43 [geesefs] <defunct>
root     22044  0.0  0.0      0     0 ?        Zs   Feb24   1:54 [geesefs] <defunct>
root     22343  0.0  0.0      0     0 ?        Zs   Feb24   1:55 [geesefs] <defunct>
root     24430  0.0  0.0      0     0 ?        Zs   Feb24   2:04 [geesefs] <defunct>
root     25217  0.0  0.0      0     0 ?        Zs   Feb24   2:00 [geesefs] <defunct>
root     41855  0.0  0.0      0     0 ?        Zs   Feb21   0:58 [geesefs] <defunct>
root     42368  0.0  0.0      0     0 ?        Zs   Feb21   0:58 [geesefs] <defunct>
root     43095  0.0  0.0      0     0 ?        Zs   Feb21   0:58 [geesefs] <defunct>
root     44776  0.0  0.0      0     0 ?        Zs   Feb21   0:58 [geesefs] <defunct>
root     46148  0.0  0.0      0     0 ?        Zs   Feb27   1:50 [geesefs] <defunct>
root     50968  0.0  0.0      0     0 ?        Zs   Feb24   0:37 [geesefs] <defunct>
root     70150  0.0  0.0      0     0 ?        Zs   Mar10   3:06 [geesefs] <defunct>
root     70799  0.0  0.0      0     0 ?        Zs   Mar10   5:32 [geesefs] <defunct>
root     72906  0.7  0.5 9078920 4445864 ?     Ssl  Mar13  41:02 /usr/bin/geesefs --endpoint https://s3.ap-southeast-1.amazonaws.com -o allow_other --log-file /dev/stderr --no-checksum --memory-limit 4000 --max-flushers 32 --max-parallel-parts 32 --part-sizes 25 --dir-mode 0777 --file-mode 0666 data-sg03-data-mining-hive: /var/lib/kubelet/pods/e323b95d-61d6-4f31-b1b5-ed8bb94af311/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount
root     80622  0.0  0.0      0     0 ?        Zs   Feb21   1:00 [geesefs] <defunct>
root     81137  0.0  0.0      0     0 ?        Zs   Feb21   0:59 [geesefs] <defunct>
root     81897  0.0  0.0      0     0 ?        Zs   Feb21   1:00 [geesefs] <defunct>
root     82696  0.0  0.0      0     0 ?        Zs   Feb21   0:59 [geesefs] <defunct>
root     86364  0.1  0.0 729152 34656 ?        Ssl  08:02   0:00 /usr/bin/geesefs --endpoint https://s3.ap-southeast-1.amazonaws.com -o allow_other --log-file /dev/stderr --no-checksum --memory-limit 4000 --max-flushers 32 --max-parallel-parts 32 --part-sizes 25 --dir-mode 0777 --file-mode 0666 data-sg03-data-mining-hive: /var/lib/kubelet/pods/c7c9eee4-a694-453d-a3fb-7529f2db1839/volumes/kubernetes.io~csi/data-sg03-data-mining-hive/mount
root     90348  0.0  0.0 119428   996 ?        S+   08:06   0:00 grep --color=auto geese
root     90531  0.0  0.0      0     0 ?        Zs   Feb24   0:58 [geesefs] <defunct>
root     90837  0.0  0.0      0     0 ?        Zs   Feb24   0:58 [geesefs] <defunct>

Hi, that seems correct, at the time of the first listing kernel still has said inodes cached and it only removes them when GeeseFS rechecks their existence on the server and replies with ENOENT to the kernel.

Thanks for your reply.
Is there a method to avoid this error, Many of our jobs failed because os.listdir() got deleted object file name.

I think we can leverage FUSE_NOTIFY_DELETE / FUSE_NOTIFY_INVAL_ENTRY to prevent such errors. It requires patching go-fuse library though, but I'll take a look into it

Check the newest release 0.35.0, it has kernel notifications.