GoogleCloudPlatform/gcs-fuse-csi-driver

Problem with too many open files

lewandowskim1988 opened this issue · 2 comments

Hello,

My current setup looks like follows:

  • Manage GKE cluster 1.29.3-gke.1282001
  • Cloud Storage Fuse CSI driver enabled in latest version
  • Bucket in the same region as k8s cluster
  • Pods running with root privileges

What is working

I am able to create PV, PVC and mount my Bucket as a storage in pod.
I can write to it, delete update files and so on.

What is not working

My whole configuration with FUSE is created to have RWX type of storage since I would like to run some nextflow pipeline which required sharing storage between pods. Everything is working until worker pod is started and try to run my workload. Soon after that I receive following error message from FUSE pod:

I0515 12:44:53.622504       1 main.go:48] Running Google Cloud Storage FUSE CSI driver sidecar mounter version v1.2.0-gke.0
I0515 12:44:55.123322       1 sidecar_mounter_config.go:101] connecting to socket "/gcsfuse-tmp/.volumes/gcs-fuse-csi-pv/socket"
I0515 12:44:55.123454       1 fdchannel.go:48] get the underlying socket
I0515 12:44:55.123473       1 fdchannel.go:60] calling recvmsg...
I0515 12:44:55.123690       1 fdchannel.go:69] parsing SCM...
I0515 12:44:55.123713       1 fdchannel.go:76] parsing SCM_RIGHTS...
I0515 12:44:55.123930       1 sidecar_mounter_config.go:269] gcsfuse config file content: map[cache-dir: logging:map[file-path:/dev/fd/1 format:json severity:warning]]
I0515 12:44:55.124012       1 sidecar_mounter.go:49] start to mount bucket "bucket-to-test-fuse" for volume "gcs-fuse-csi-pv"
I0515 12:44:55.124105       1 sidecar_mounter.go:68] gcsfuse mounting with args [--app-name gke-gcs-fuse-csi --foreground --uid 0 --gid 0 --temp-dir /gcsfuse-buffer/.volumes/gcs-fuse-csi-pv/temp-dir --config-file /gcsfuse-tmp/.volumes/gcs-fuse-csi-pv/config.yaml --implicit-dirs bucket-to-test-fuse /dev/fd/3]...
I0515 12:44:55.124275       1 main.go:73] waiting for SIGTERM signal...
I0515 12:44:55.125416       1 sidecar_mounter.go:103] gcsfuse for bucket "bucket-to-test-fuse", volume "gcs-fuse-csi-pv" started with process id 21
{"timestamp":{"seconds":1715777124,"nanos":810077094},"severity":"ERROR","message":"Rename: too many open files, too many objects to be renamed: too many open files"}
{"timestamp":{"seconds":1715777124,"nanos":810207701},"severity":"ERROR","message":"fuse: *fuseops.RenameOp error: too many open files"}
I0515 12:45:31.875930       1 main.go:110] received SIGTERM signal, waiting for all the gcsfuse processes exit...
I0515 12:45:31.876018       1 sidecar_mounter.go:75] sending SIGTERM to gcsfuse process: /gcsfuse --app-name gke-gcs-fuse-csi --foreground --uid 0 --gid 0 --temp-dir /gcsfuse-buffer/.volumes/gcs-fuse-csi-pv/temp-dir --config-file /gcsfuse-tmp/.volumes/gcs-fuse-csi-pv/config.yaml --implicit-dirs bucket-to-test-fuse /dev/fd/3
I0515 12:45:31.882649       1 sidecar_mounter.go:111] [gcs-fuse-csi-pv] gcsfuse was terminated.
I0515 12:45:31.882676       1 main.go:118] exiting sidecar mounter...

How I was trying to fix my problem

I was trying to add more ephemeral storage, CPU and memory to FUSE pod but without any luck.
I also was trying to change sysctl max files open but I am receiving following error forbidden sysctl: "fs.file-max" not allowlisted (please note that I am new to GPC).

Could you advise me how can I handle this issue?

This error

"fuse: *fuseops.RenameOp error: too many open files"

--rename-dir-limit is the flag that needs to be set ,as your application seems to be doing rename dir operation.

is discussed here as well:
GoogleCloudPlatform/gcsfuse#644

Thank you for help. This indeed resolved issue with too many open files.