awslabs/amazon-eks-ami

Pulling Image with cri that is available to EKS Pods

Closed this issue · 5 comments

Environment:

  • AWS Region: us-west-2
  • Instance Type(s): m7.*
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.7
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.29
  • AMI Version: /aws/service/eks/optimized-ami/1.29/am azon-linux-2023/x86_64/standard/recommended/image_id

I am trying to use cri to pull my latest Image daily using Image Builder. I am then using Karpenter to launch these AMIs in a NodePool. The idea is to reduce worst case scenario scale up time when the cluster needs to autoscale and also pull a large ML Docker Image.

This is the image builder template:

name: ml-image-pull
description: Pulls the latest ml-image Docker Image.
schemaVersion: 1.0

phases:
  - name: build
    steps:
      - name: pull-ml-image
        action: ExecuteBash
        inputs:
          commands:
            - password=$(aws ecr get-login-password --region us-west-2)
            - echo "pulling ml-image:latest..."
              # Redirecting stdout because the process creates thousands of log lines.
            - sudo ctr --namespace k8s.io images pull --user AWS:$password account_id.dkr.ecr.us-west-2.amazonaws.com/ml-image:latest > /dev/null
              # This command also has a ton of output which creates noise, so only printing what we want.
            - sudo ctr --namespace k8s.io images list | head -n 1
            - sudo ctr --namespace k8s.io images list | grep ml-image
  - name: test
    steps:
      - name: confirm-ml-image-pulled
        action: ExecuteBash
        inputs:
          commands:
            - set -e
            - sudo ctr --namespace k8s.io images list | grep ml-image

This all works as expected; however, the image still needs to pull when I try and launch a pod using Jupyterhub and the latest tag.

Using the above to SSH into the node and pull a specific tag foo and subsequently launching a pod via Jupyterhub will still require the foo tag to pull. I've confirmed that this takes place all on the same node.

Thoughts? This image should definitely be pulled to the k8s.io namespace?

Hey @seanturner026, thanks for reporting, we haven't see such issue before, do you mind share some reproduction steps?

Cheers @Issacwww, will put something together. Can you confirm that this solution is implemented correctly fundamentally? E.g. that a Docker Image in the k8s.io Containerd Namespace should be accessible by EKS (Kubelet?)

The name space should be ok

This works when configuring imagePullPolicy to be something other than Always 🤦‍♂️

Thanks for the eyes that said! It's a really really valuable optimization time-wise for us so maybe this is useful for others in the future.

We ultimately have paused this project because it wasn't offering any speed improvements likely due to something I did incorrectly.

The two primary observations:

  • Any Node that uses this AMI built by Image Builder would take twice as long to come online as the default AMI spun up by karpenter.
  • Once the AMI we built Node was online, it would seem like the file system would be super cold as a process (loading a dask jupyterhub plugin would take 40 seconds whereas it generally takes 2 seconds). Launching a second jupyterhub workload to this same Node would load the process in .25 seconds (which is a behavior consistent with the non-built AMI Nodes which need to pull our data science docker image). This behavior is specific to jupyterhub, but, likely offers some insight.