GoogleCloudPlatform/gcs-fuse-csi-driver

Sidecar not working with workload identity

ck250186 opened this issue · 3 comments

We have an app that is trying to use GCSFuse CSI driver to connect a cloud storage bucket as an ephemeral volume.

We see this error when the Pod is initializing

FailedMount
MountVolume.SetUp failed for volume "gcsfuse-csi-volume" : rpc error: code = Unauthenticated desc = failed to prepare storage service: storage service manager failed to setup service: context deadline exceeded

On investigating the logs further we are seeing this for the CSI driver logs

{
  "insertId": "q8c6r58br9tgttx3",
  "jsonPayload": {
    "message": "/csi.v1.Node/NodePublishVolume called with request: volume_id:\"csi-893e225a821a4110581b7d0e775c6b3275fdc2f2b37bde3bbf897254c2e682cc\" target_path:\"/var/lib/kubelet/pods/e3aeade0-f02a-499b-91e0-737e037ed426/volumes/kubernetes.io~csi/gcsfuse_csi_volume/mount\" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"bucketName\" value:\"gcsfuse\" > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"true\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"gcsfuse_csi-test-7979c4764-z5pbh\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"gcsfuse-ns\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"e3aeade0-f02a-499b-91e0-737e037ed426\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"gcsfuse-sa\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.tokens\" value:\"***stripped***\" > volume_context:<key:\"mountOptions\" value:\"implicit-dirs,debug_fuse,debug_fs,debug_gcs\" > ",
    "pid": "1"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "cluster_name": "project_1_cluster_1",
      "pod_name": "gcsfusecsi-node-scx67",
      "location": "us-east1",
      "container_name": "gcs-fuse-csi-driver",
      "project_id": "project_1",
      "namespace_name": "kube-system"
    }
  },
  "timestamp": "2023-08-22T19:41:22.787536755Z",
  "severity": "INFO",
  "labels": {
    "compute.googleapis.com/resource_name": "project_1_cluster_1_nodepool_1",
    "k8s-pod/controller-revision-hash": "76cf56d8c8",
    "k8s-pod/pod-template-generation": "1",
    "k8s-pod/k8s-app": "gcs-fuse-csi-driver"
  },
  "logName": "projects/project_1/logs/stderr",
  "sourceLocation": {
    "file": "utils.go",
    "line": "83"
  },
  "receiveTimestamp": "2023-08-22T19:41:22.872838087Z"
}
{
  "insertId": "063x361d1ld2tskm",
  "jsonPayload": {
    "message": "error fetching initial token: GCP service account token fetch error: fetch GCP service account token error: rpc error: code = PermissionDenied desc = Permission 'iam.serviceAccounts.getAccessToken' denied on resource (or it may not exist).",
    "pid": "1"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "pod_name": "gcsfusecsi-node-scx67",
      "location": "us-east1",
      "container_name": "gcs-fuse-csi-driver",
      "project_id": "project_1",
      "cluster_name": "project_1_cluster_1",
      "namespace_name": "kube-system"
    }
  },
  "timestamp": "2023-08-22T19:31:14.678976822Z",
  "severity": "ERROR",
  "labels": {
    "k8s-pod/pod-template-generation": "1",
    "compute.googleapis.com/resource_name": "project_1_cluster_1_nodepool_1",
    "k8s-pod/k8s-app": "gcs-fuse-csi-driver",
    "k8s-pod/controller-revision-hash": "76cf56d8c8"
  },
  "logName": "projects/project_1/logs/stderr",
  "sourceLocation": {
    "file": "storage.go",
    "line": "71"
  },
  "receiveTimestamp": "2023-08-22T19:31:17.875220538Z"
}

I see that the csi driver is initializing in the kube-system namespace. But my app is in another namespace. We are using workload identity in the namespace for the app. I can see from the initialization that it knows what the service account is and what SA account to use, but not sure why I am getting the PermissionsDenied message

Hi @ck250186 , the errors you saw from the Pod events are also generated by the CSI driver. So the error messages are from the same place, and both messages indicate that the service account does not have the proper permission.

Could you follow the validation steps in https://github.com/GoogleCloudPlatform/gcs-fuse-csi-driver/blob/main/docs/authentication.md to validate if all the setup is correct?

We are seeing the same errors. I have gone through the validation steps to ensure that authentication is done correctly and it is.

When I stop using a KSA I get this in the logs:

Kubernetes SA [NAMESPACE/default] is not bound with a GCP SA, proceed with the IdentityBindingToken
/csi.v1.Node/NodePublishVolume failed with error: rpc error: code = PermissionDenied desc = failed to get GCS bucket "BUCKET": googleapi: Error 403: Caller does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist)., forbidden

We do have private nodes and I see that there is a webhook involved, but that does not seem to match the type of errors.
If we can provide you with any more information, please let me know.

Turns out I misconfigured the workloadIdentityUser, swapping NAMESPACE and KSA_NAME. Everything works fine now 👍