fluxcd/source-controller

Can't use GCP Workload identity to pull OCI Helm Charts from GAR

IlyesSemlali opened this issue · 5 comments

Context

We're trying to remove the need of tokens to pull Helm charts from the Google Artifact Registry using Workload Identity. Our cluster runs Flux v2.3.0 on GKE v1.29.7.

We've been following this documentation page: Workload Identity

Here's what we did:

  1. Enabled workload identity on the GKE cluster
  2. Created a Google SA with read access to the GAR
  3. Annotated manually the source-controller, kustomize-controller and image-reflector-controller
  4. Tried to pull Helm charts from the GAR

Extra Informations:

  • The chart pull worked before trying to use workload identity using JWT
  • Listing charts using the helm command inside a pod using workload idenity works:
kubectl exec debug-pod -it -- /google-cloud-sdk/bin/gcloud artifacts repositories list
REPOSITORY             FORMAT  MODE                 DESCRIPTION                                                                                 LOCATION      LABELS                          ENCRYPTION          CREATE_TIME          UPDATE_TIME          SIZE (MB)
charts-repo                DOCKER  STANDARD_REPOSITORY  Registry for OCI Helm Charts                                                                europe                                        Google-managed key  2024-05-15T14:05:39  2024-08-22T14:12:51  3.444

Here's a flux pull command output (from inside a pod):

root@debug-pod:/# flux pull artifact oci://europe-docker.pkg.dev/our-project/charts-repo/stateless-app:12.3.5 --output /tmp
► pulling artifact from europe-docker.pkg.dev/our-project/charts-repo/stateless-app:12.3.5
✗ GET https://europe-docker.pkg.dev/v2/token?scope=repository%3Aour-project%2Fcharts-repo%2Fstateless-app%3Apull&service=: DENIED: Unauthenticated request. Unauthenticated requests do not have permission "artifactregistry.repositories.downloadArtifacts" on resource "projects/our-project/locations/europe/repositories/charts-repo" (or it may not exist)

And here's the kubectl describe helmrepository chart-repo:

Message:                  HelmChart 'flux-system/namespace-oci-app' is not ready: chart pull error: failed to download chart for remote reference: failed to get 'oci://europe-docker.pkg.dev/our-project/charts-repo/stateless-app:12.3.5': failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://europe-docker.pkg.dev/v2/token?scope=repository%!!(MISSING)!(MISSING)A(MISSING)our-project%!!(MISSING)!(MISSING)F(MISSING)charts-repo%!!(MISSING)!(MISSING)F(MISSING)stateless-app%!!(MISSING)!(MISSING)A(MISSING)pull&service=europe-docker.pkg.dev: 403 Forbidden

As you can see, the flux CLI isn't using workload idenity and tries to pull the OCI chart anonymously.

Is there something more that we need to do to get it to work with workload identity ?

Hello @IlyesSemlali, I just tested this feature and it works, I was able to pull an OCI Helm Chart from GAR through Workload Identity. Can you please provide more details of your configuration so we can troubleshoot? For example, your HelmRepository and HelmChart YAML manifests. Also, is the debug-pod using the same Kubernetes ServiceAccount as the source-controller pod?

Hi @matheuscscp, thanks for your answer !

The debug-pod is using the same SA as the source-controller pod.

Here is the HelmRepository that we use:

HelmRepository
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  labels:
    kustomize.toolkit.fluxcd.io/name: flux-system
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  name: helm-repo-oci
  namespace: flux-system
  resourceVersion: "1177908037"
spec:
  interval: 5m0s
  provider: generic
  type: oci
  url: oci://europe-docker.pkg.dev/our-project/helm-repo
status: {}

And here's are the HelmChart manifests:

HelmChart
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmChart
metadata:
  creationTimestamp: "2024-08-28T14:26:49Z"
  finalizers:
  - finalizers.fluxcd.io
  generation: 1
  labels:
    helm.toolkit.fluxcd.io/name: oci-app
    helm.toolkit.fluxcd.io/namespace: namespace
  name: namespace-oci-app
  namespace: flux-system
  resourceVersion: "1186147102"
  uid: ab1b685a-c14d-462c-80ef-0d578902b91f
spec:
  chart: stateless-app
  interval: 5m0s
  reconcileStrategy: ChartVersion
  sourceRef:
    kind: HelmRepository
    name: helm-charts-oci
  version: 2.0.4
status:
  conditions:
  - lastTransitionTime: "2024-09-06T12:47:23Z"
    message: building artifact
    observedGeneration: 1
    reason: ProgressingWithRetry
    status: "True"
    type: Reconciling
  - lastTransitionTime: "2024-09-06T12:47:23Z"
    message: 'chart pull error: failed to download chart for remote reference: failed
      to get ''oci://europe-docker.pkg.dev/our-project/helm-charts/stateless-app:2.0.4'':
      failed to authorize: failed to fetch anonymous token: unexpected status from
      GET request to https://europe-docker.pkg.dev/v2/token?scope=repository%!!(MISSING)A(MISSING)our-project%!!(MISSING)F(MISSING)helm-charts%!!(MISSING)F(MISSING)stateless-app%!!(MISSING)A(MISSING)pull&service=europe-docker.pkg.dev:
      403 Forbidden'
    observedGeneration: 1
    reason: ChartPullError
    status: "False"
    type: Ready
  - lastTransitionTime: "2024-08-29T10:15:00Z"
    message: 'chart pull error: failed to download chart for remote reference: failed
      to get ''oci://europe-docker.pkg.dev/our-project/helm-charts/stateless-app:2.0.4'':
      failed to authorize: failed to fetch anonymous token: unexpected status from
      GET request to https://europe-docker.pkg.dev/v2/token?scope=repository%!A(MISSING)our-project%!F(MISSING)helm-charts%!F(MISSING)stateless-app%!A(MISSING)pull&service=europe-docker.pkg.dev:
      403 Forbidden'
    observedGeneration: 1
    reason: ChartPullError
    status: "True"
    type: FetchFailed
  observedGeneration: -1

And also here's the annotation command we used:

kubectl annotate serviceaccount --namespace=flux-system source-controller \
    "iam.gke.io/gcp-service-account=flux-service-account@our-project.iam.gserviceaccount.com" --overwrite

Thanks @IlyesSemlali, I've already spotted your issue :)

Change spec.provider from generic to gcp, this is essential for WI to work.

Hi @matheuscscp sorry for the delayed answer, but you figured it out, thank's a lot !

I could be a great improvement to the doc to add a line or two telling us to make sure that the provider is properly specified on the workload identity guide, what do you think ?

Hello @IlyesSemlali, here we are specifically talking about a field from the HelmRepository API, which is spec.provider. This API is properly documented here: https://fluxcd.io/flux/components/source/helmrepositories/#provider

Screenshot from 2024-10-01 10-40-19

Every Flux API containing this field is documented the same way.