kubernetes/cloud-provider-aws

Allow `ecr-credential-provider` to generate credentials for any registry

jBouyoud opened this issue ยท 9 comments

What would you like to be added: Allow ecr-credential-provider to generate credentials for any registry (only base restrictions on CredentialProviderConfig)

Why is this needed:
Today ecr-credential-provider, is limited to public.ecr.aws , and registry url compliant with ecrPattern.

In order to achieve this functionality we need to remove this restriction.

End user use case :
Using an EKS Cluster, we need to mirror public registry where SLA aren't garrantee or with some IP Rate limits restrictions.

In order to achieve that, I've setuped the followings :

  • ECR Pull through cache to public.ecr.aws , quay.io and registry.k8s.io
  • Containerd configured to mirror public.ecr.aws , quay.io and registry.k8s.io to my private ECR registry
  • Private registry is hosted on another account (cross account scenario), on the same aws region
  • aws-ecr-credential-provider configured to generate credentials for : public.ecr.aws , quay.io and registry.k8s.io with a config file like :
    {
      "apiVersion": "kubelet.config.k8s.io/v1alpha1",
      "kind": "CredentialProviderConfig",
      "providers": [
        {
          "name": "ecr-credential-provider",
          "matchImages": [
            "*.dkr.ecr.*.amazonaws.com",
            "*.dkr.ecr.*.amazonaws.com/*",
            "*.dkr.ecr.*.amazonaws.com.cn",
            "*.dkr.ecr-fips.*.amazonaws.com",
            "*.dkr.ecr.*.c2s.ic.gov",
            "*.dkr.ecr.*.sc2s.sgov.gov",
            "quay.io",
            "public.ecr.aws",
            "registry.k8s.io"
          ],
          "defaultCacheDuration": "12h",
          "apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1"
        }
      ]
     }
  • Current status : No credentials has been generated, and pull through cache aren't use
  • Expected status : Credentials should be generated (for the current node , on the same aws region, or an explicit configuration on CredentialProviderConfig) , to allow my node to use my private registry mirror for public registry.

Potential solution:

I wonder if it's really needed to generate an error here and not relying on SDK region behavior (and potential user configuration in provider configuration - AWS_REGION env variable in CredentialProviderConfig for exemple).

By removing program error here , with something like :

if len(splitHost) != 6 {
  return "", nil
}

(further code updates will be probably needed ;-) )
this provider will rely on AWS SDK default behaviour, and let opportunity to users to handle more uses cases.
Relying on SDK default behavior will still throw an error when no region are configured (Eg. : E0922 13:04:35.558288 57050 main.go:255] Error running credential provider plugin: MissingRegion: could not find region configuration )

Ref.: #667 (comment)

Sample working code modification : https://github.com/jBouyoud/cloud-provider-aws/commit/5a167d0ae233981db3a992dd767b1ff1f60fdcda

I'll be glad to help on this topic, please let me known how.
Regards

/kind feature

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

To make sure I understand, you want to be able to create a Pod using an image like registry.k8s.io/foo:latest. But you want containerd to actually pull and use my-private-mirror.com/registry.k8s.io/foo:latest?

I get why this could be handy, but I'd try to solve this as far up the stack (and as early in the Pod lifecycle) as possible. You could use a mutating admission webhook to rewrite the images of incoming Pod's to use your mirror, for example ( from a quick search, there's a handful of implementations of this out there). This would allow you to modify your mirror config more easily and would make the k8s API objects reflect the actual desired state.

Hi @cartermckinnon, yes that's exactly that I want to do.

Deploying a mutation webhook could also be an option (second choice after this one) but requires to setup new critical components (on admission chain), and currently we want to avoid this as far as possible.

This is also due to our responsabillity model :

  • Developers : who are Kube manifest owners : They probably won't understand why their pod manifest will be mutated (even if this will be well documented in our organization)
  • Platform/SRE guys (me) : infra owners (like Kube) : This can this is only an SRE requirements to pull image from a secure/available location.

The intent here is only to allow ecr-credential-provider to provide credential according to CredentialProviderConfig configuration. Today this works only for a subset of registry (public.ecr.aws and all url compliant with an ecr registry)

Also, not all usecases can be handled with a mutating webhook, because some images are pulled without a podspec, such as the registry.k8s.io/pause image (my specific use case, also pull-through mirrored on AWS ECR), or gcr.io/etcd-development/etcd for control plane nodes, where etcd is not running as a kubernetes pod.

kmala commented

because some images are pulled without a podspec, such as the registry.k8s.io/pause image

The only image i can think of being used without mention in pod spec is pause container and it can be updated using the containerd config https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd
If you want to update the control plane nodes images, then you must be creating your own control plane right? which would mean you should have access to use any image you want correct?

Overriding is possible, but then you'd have to additionally maintain the version in the tag.

The default pause / etcd images are set by my kubernetes distribution.

@cartermckinnon, @kmala

The intent is to let the end-user choose how to use ecr-credential-provider according to CredentialProviderConfig expressivity, since code appear to be very minor (remove a custom check to rely on aws sdk behavior instead).

Please let me known, if you think this is relevant feature or not.
I'll happy to open a pull request on that.

As workaround (for posterity), I add the following configuration to my existing kubelet image-credential-provider-config :

{
      "name": "mirrored-ecr-credential-provider",
      "matchImages": [
        "quay.io",
        "public.ecr.aws",
        "registry.k8s.io"
      ],
      "defaultCacheDuration": "12h",
      "apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1"
    }

where mirrored-ecr-credential-provider looks like :

#!/usr/bin/env bash
set -eu

CREDENTIAL_PROCESS_DEBUG_ENABLED="false"

# Catch image in query if container a mirroring rule , rewrite the request
REQUEST="$(cat)"
REQUEST_IMAGE="$(echo "${REQUEST}" | jq -r ".image")"
REQUEST_HOST="$(echo "${REQUEST_IMAGE}" | cut -d'/' -f1)"

MIRROR_REQUEST="$(echo "${REQUEST}" | jq ".image = \"<account>.dkr.ecr.<region>.amazonaws.com/\" + .image")"

if [[ "${CREDENTIAL_PROCESS_DEBUG_ENABLED}" == "true" ]]; then
  echo "Mirror image for ${REQUEST_HOST} -- ${MIRROR_REQUEST}" >> /var/log/mirrored-ecr-credential-provider-requests.log
  echo "${MIRROR_REQUEST}" | /etc/eks/image-credential-provider/ecr-credential-provider | sed -e "s/<account>.dkr.ecr.<region>.amazonaws.com/${REQUEST_HOST}/" | tee -a /var/log/mirrored-ecr-credential-provider-responses.log
else
  echo "${MIRROR_REQUEST}" | /etc/eks/image-credential-provider/ecr-credential-provider | sed -e "s/<account>.dkr.ecr.<region>.amazonaws.com/${REQUEST_HOST}/"
fi

I don't think this is the best way to achieve the goal, but I'm not opposed to changing a couple lines in the ecr-credential-provider to unblock you; feel free to open a PR.

It's worth considering that if you have any issues with your containerd mirror config, you could end up sending your ECR credentials to the wrong remote.