GoogleCloudPlatform/gcs-fuse-csi-driver

Enabling file cache without webhook injector

christopher-kwan-ai opened this issue · 6 comments

The webhook injector specifies an EmptyDir as the cache directory and I would prefer to be able to specify a persistent volume so that I can readily increase the size of my cache depending on my reading pattern.

Trying to work around this by specifying the gcs-gkefuse-sidecar container on my own instead of using the pod annotations. This allows me to specify which particular version of the CSI driver to use (trying out v1.2.0 right now) and also to specify the cache dir as an ephemeral volume.

This is however still failing: rpc error: code = FailedPrecondition desc = failed to find the sidecar container in Pod spec.

Any idea if this idea is feasible at all? The below is an example of the YAML definition I am using:

Details
	containers:
      - name: gke-gcsfuse-sidecar
        image: <REGISTRY>/gcs-fuse-csi-driver-sidecar-mounter:v1.2.0
        imagePullPolicy: IfNotPresent
        args:
        - --v=5
        - --grace-period=30
        volumeMounts:
        - mountPath: /gcsfuse-buffer
          name: gke-gcsfuse-buffer
        - mountPath: /gcsfuse-tmp
          name: gke-gcsfuse-tmp
        - mountPath: /gcsfuse-cache
          name: cachedir
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65534
          runAsNonRoot: true
          runAsUser: 65534
     ....
     ....
	volumes:
	- ephemeral:
        volumeClaimTemplate:
          metadata:
            labels:
              type: main-volume
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: standard
      name: cachedir
    - csi:
        driver: gcsfuse.csi.storage.gke.io
        readOnly: true
        volumeAttributes:
          bucketName: <some_output_bucket>
          fileCacheCapacity: "-1"
          fileCacheForRangeRead: "true"
          metadataStatCacheCapacity: "-1"
          metadataTypeCacheCapacity: "-1"
          metadataCacheTTLSeconds: "-1"
          mountOptions: implicit-dirs,only-dir=<some_dir>,logging:severity:info
      name: fuse-dir
    - name: gke-gcsfuse-tmp
      emptyDir: {}
    - name: gke-gcsfuse-buffer
      emptyDir: {}

Hi @christopher-kwan-541 , we will support custom cache volume. The public documentation is pending will be published soon this week.

Similar to https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#buffer-volume, you can specify a volume called gke-gcsfuse-cache.

In your case, you are completely bypassing the sidecar injection, and it should also work.

In your yaml example, the cachedir volume and other volumes are not on the same level with the same indentation. Is this on purpose?

Apologies. I have updated the YAML with the correct indentation for the volumes section. That above YAML is not working with the error:

rpc error: code = FailedPrecondition desc = failed to find the sidecar container in Pod spec

though. Any clues on why that may be the case? I am using v1.2.0, do let me know if I should be using a different release tag.

Hi @christopher-kwan-541 what is the current GKE version you are using? I cannot reproduce the issue using the shared yaml.

Please note that the file cache feature will be available soon on GKE this week, so you may want to wait a little bit to try out the official version.

I am on v1.26.11-gke.1055000. From our paired debugging today, it looks like I need to use the image hosted by Google.

Here is the full required YAML for posterity:

Details
	securityContext:
		fsGroup: 100
	containers:
      - name: gke-gcsfuse-sidecar
        image:.gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter@sha256:31880114306b1fb5d9e365ae7d4771815ea04eb56f0464a514a810df9470f88f
        imagePullPolicy: IfNotPresent
        args:
        - --v=5
        volumeMounts:
        - mountPath: /gcsfuse-buffer
          name: gke-gcsfuse-buffer
        - mountPath: /gcsfuse-tmp
          name: gke-gcsfuse-tmp
        - mountPath: /gcsfuse-cache
          name: cachedir
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65534
          runAsNonRoot: true
          runAsUser: 65534
     ....
     ....
	volumes:
	- ephemeral:
        volumeClaimTemplate:
          metadata:
            labels:
              type: main-volume
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: standard
      name: cachedir
    - csi:
        driver: gcsfuse.csi.storage.gke.io
        readOnly: true
        volumeAttributes:
          bucketName: <some_output_bucket>
          mountOptions:.implicit-dirs,only-dir=<test_dir>,logging:severity:info,file-cache:max-size-mb:-1,file-cache:cache-file-for-range-read:true
      name: fuse-dir
    - name: gke-gcsfuse-tmp
      emptyDir: {}
    - name: gke-gcsfuse-buffer
      emptyDir: {}

Verified that the sidecar is now using the cache and the cache is saving files into the specified ephemeral volume. However, it looks like read performance is still poor despite having the cache. It even looks like it has degraded further with the cache on.

Let me know if you prefer I create a new issue since this main issue pertains to being able to instantiate the cache.

Thanks @christopher-kwan-541 , I will close this issue.

Let's follow up using Google support channel.

If the GKE logging still does not have proper permission role added, please run the following command to collect the sidecar container log and share the output file with the Google support engineer you are working with.

kubectl logs <pod-name> -c gke-gcsfuse-sidecar > sidecar-container.log