GoogleCloudDataproc/hadoop-connectors

Question: How to use gcs-connector on GKE with Workload Identity

aristapimenta opened this issue · 1 comments

Hi,

I'm currently using gcs-connector 2.2.7 on a GKE cluster with a static service account JSON key. I want to get rid of this static credential and start using Workload Identity.

I already use Workload Identity for many other applications in this cluster, so I'm sure that my attempt is configured correctly at least for the following items:

  1. The GCP SA has the required permissions on the bucket (because it works using the JSON key)
  2. The k8s SA has the workload identity user role on the GCP SA
  3. The k8s SA has the iam annotation with the GCP SA email
  4. The pod is configured to use this k8s SA
  5. On the same container where I run the gcs-connector I also run a Go binary that correctly authenticates via Workload Identity

I have tried to use a few different settings in core-site.xml to enable Workload Identity but all I'm getting are 403 errors:

  • google.cloud.auth.type with COMPUTE_ENGINE and APPLICATION_DEFAULT
  • fs.gs.auth.type with COMPUTE_ENGINE and APPLICATION_DEFAULT
  • Not specifying google.cloud.auth.type or fs.gs.auth.type hoping for the default to be the right one

I have also tried to use gcs-connector 2.2.15 instead of 2.2.7 but it doesn't help.

Appreciate any help. Is Workload Identity supported in the gcs-connector at all?

Seems like you just don't need to specify anything related to the Google Service Account for Workload Identity to work.