kubernetes/cloud-provider-aws

Credential provider sometimes identifies private repo as public repo

thecodebeneath opened this issue · 5 comments

Credential provider sometimes identifies private repo as public repo, based on an image name that was copied from public to private. Our use case for this is to support application deployments in "air-gapped" environments, where all dependencies are prepackaged and installed in private image repos. When the credential provider ran in our private AWS environment, it failed to fetch the ECR credentials based on how we retagged the images, therefore our EKS deployments failed.

The existing credential provider implementation certainly works in majority of cases, so this may be a case of just violating the principle of least surprise in cases such as ours. See "Anything else we need to know?" section.

What happened:
When we bring over the ebs-csi driver images from the public aws registry, we just prepend our private registry name to the original public image name. So for example, if the original image is "public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver"", we retag it and push to our private ECR repo as "11111111111.dkr.ecr.us-gov-west-1.amazonaws.com/public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver". This is so we have some traceability of the source.

What you expected to happen:
For EKS deployments in our private environment, we expect the pods to pull images successfully from ECR after the credential provider fetches the configured private repo credentials.

Note that when specifying EKS 1.25, and using compatible k8s binaries, the images pull successfully in our private env. I believe the kubelet at version 1.25 uses the in-tree credential provider, while EKS 1.27 / kubelet 1.27 is the first iteration to require the external credential provider.

How to reproduce it (as minimally and precisely as possible):
Using any image from the ECR Public Gallery (public.ecr.aws), retag the image when pushing to a private ECR repository. The new tag should preserve the original image tag by only prefixing new tag characters. As the example:
Original image: "public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver"
Private retagged image: "11111111111.dkr.ecr.us-gov-west-1.amazonaws.com/public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver"

Use kubectl is apply a deployment manifest that uses an "image:" from the private repository.

Our EKS version is 1.27.x, and kubelet v1.27.5-eks-43840fb is configured correctly to use the external credential provider config file and binary v1.27.2.

Anything else we need to know?:
There's a strings.Contains statement (see here) that sees public.ecr.aws in the middle of our image name and mistakenly treats it like a public registry and thus fails to get the credentials. You can verify this running the ecr-credential-provider binary on one of the worker nodes.

The following command will fail because the repository name contains public.ecr.aws:

echo '{​​​​​​​​​​"kind": "CredentialProviderRequest", "apiVersion": "credentialprovider.kubelet.k8s.io/v1", "image": "11111111111.dkr.ecr.us-gov-east-1.amazonaws.com/public.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver"}​​​​​​​​​​' | ./ecr-credential-provider

However, if we reference a different image without the public.ecr.aws, it succeeds:

echo '{​​​​​​​​​​​​​​​​​"kind": "CredentialProviderRequest", "apiVersion": "credentialprovider.kubelet.k8s.io/v1", "image": "11111111111.dkr.ecr.us-gov-west-1.amazonaws.com/busybox"}​​​​​​​​​​​​​​​​​' | ./ecr-credential-provider

Our workaround is that when we retag the image, we also modify the string "public.ecr.aws" to become "pub.ecr.aws". A complete example would be: "11111111111.dkr.ecr.us-gov-east-1.amazonaws.com/pub.ecr.aws/ebs-csi-driver/aws-ebs-csi-driver"

Environment:

  • Kubernetes version (use kubectl version): 1.27.x
  • Cloud provider or hardware configuration: AWS GovCloud (US)
  • OS (e.g. from /etc/os-release): RHEL8
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

/kind bug

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This was fixed in #667 and was cherrypicked to release-1.28 in #681. The bug doesn't exist in earlier versions of the ecr-credential-provider, which have no support for ECR Public.

Correction: the commit that introduced this support (#603) was mistakenly cut into v1.27.2 (that release was tagged incorrectly). It's fixed in v1.27.3.

/close

@cartermckinnon: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.