awslabs/amazon-eks-ami

EKS Windows pods failing to start up in 1.29

Closed this issue · 3 comments

What happened:

Windows pods not successfully starting up. Getting error:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "amazonaws.com/eks/pause-windows:latest": failed to pull image "amazonaws.com/eks/pause-windows:latest": failed to pull and unpack image "amazonaws.com/eks/pause-windows:latest": failed to resolve reference "amazonaws.com/eks/pause-windows:latest": failed to do request: Head "https://amazonaws.com/v2/eks/pause-windows/manifests/latest": dial tcp 207.171.166.22:443: connectex: No connection could be made because the target machine actively refused it.

What you expected to happen:

The pods start up normally.

How to reproduce it (as minimally and precisely as possible):

Create a windows node group. Deployment of windows pods succeed initially, but start failing after a day of so.

Anything else we need to know?:

Seems to be related to this issue #1597 that was happening on the linux nodes.

Environment:

  • AWS Region: us-east-1
  • Instance Type(s): m6a.large
  • EKS Platform version (use aws eks describe-cluster --name <name> --query cluster.platformVersion): eks.1
  • Kubernetes version (use aws eks describe-cluster --name <name> --query cluster.version): 1.29
  • AMI Version: 1.29-2024.01.09
  • AMI Type: WINDOWS_FULL_2019_x86_64
  • Kernel (e.g. uname -a):
  • Release information (run cat /etc/eks/release on a node):

This repository is only used to track issues with the Amazon Linux based EKS AMI’s; the Windows EKS team is aware of this issue.

Hi @koalafi-hilarym, EKS Windows has released a fix for this issue. Please upgrade your Windows nodegroup to use the latest 1.29 AMI, which has the release version 1.29-2024.02.06.

Running into the same issue with 1.28 AMI. So is the only fix to upgrade to 1.29-2024.02.06 or higher?