Error in ReplicaSet "request did not complete within requested timeout" installing addon.

Question

Error in ReplicaSet "request did not complete within requested timeout" installing addon.

Closed this issue 7 months ago · 2 comments

jonathan-d-palumbo commented 7 months ago

/kind bug

What happened?
Upgrading from "v1.29.1-eksbuild.1" -> "v1.30.0-eksbuild.1" the update was timing out. I removed the added on then tried reinstalling and the addon is now stuck on "Creating". Upon further investigation is appears the replica set for the controller is timing out during creation of the controller pods. The replica set shows this error in the events.

Error creating: Timeout: request did not complete within requested timeout - context deadline exceeded

Is there somewhere I can look for additional details on what is causing the timeout?

What you expected to happen?

Expected the EBS CSI Driver to install successfully.

How to reproduce it (as minimally and precisely as possible)?
I am not sure exactly, this seems to be exclusive to a single cluster.

Anything else we need to know?:
ReplicaSet Describe

Name:           ebs-csi-controller-854b999fdc
Namespace:      kube-system
Selector:       app=ebs-csi-controller,app.kubernetes.io/name=aws-ebs-csi-driver,pod-template-hash=854b999fdc
Labels:         app=ebs-csi-controller
                app.kubernetes.io/component=csi-driver
                app.kubernetes.io/managed-by=EKS
                app.kubernetes.io/name=aws-ebs-csi-driver
                app.kubernetes.io/version=1.30.0
                pod-template-hash=854b999fdc
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 3
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/ebs-csi-controller
Replicas:       0 current / 2 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=ebs-csi-controller
                    app.kubernetes.io/component=csi-driver
                    app.kubernetes.io/managed-by=EKS
                    app.kubernetes.io/name=aws-ebs-csi-driver
                    app.kubernetes.io/version=1.30.0
                    pod-template-hash=854b999fdc
  Service Account:  ebs-csi-controller-sa
  Containers:
   ebs-plugin:
    Image:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/aws-ebs-csi-driver:v1.30.0
    Port:       9808/TCP
    Host Port:  0/TCP
    Args:
      controller
      --endpoint=$(CSI_ENDPOINT)
      --k8s-tag-cluster-id=guardian-prod-primary
      --batching=true
      --logging-format=text
      --user-agent-extra=eks
      --v=2
    Limits:
      memory:  256Mi
    Requests:
      cpu:      10m
      memory:   40Mi
    Liveness:   http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Readiness:  http-get http://:healthz/healthz delay=10s timeout=3s period=10s #success=1 #failure=5
    Environment:
      CSI_ENDPOINT:           unix:///var/lib/csi/sockets/pluginproxy/csi.sock
      CSI_NODE_NAME:           (v1:spec.nodeName)
      AWS_ACCESS_KEY_ID:      <set to the key 'key_id' in secret 'aws-secret'>      Optional: true
      AWS_SECRET_ACCESS_KEY:  <set to the key 'access_key' in secret 'aws-secret'>  Optional: true
      AWS_EC2_ENDPOINT:       <set to the key 'endpoint' of config map 'aws-meta'>  Optional: true
      AWS_REGION:             us-east-1
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-provisioner:
    Image:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-provisioner:v4.0.1-eks-1-30-2
    Port:       <none>
    Host Port:  <none>
    Args:
      --timeout=60s
      --csi-address=$(ADDRESS)
      --v=2
      --feature-gates=Topology=true
      --extra-create-metadata
      --leader-election=true
      --default-fstype=ext4
      --kube-api-qps=20
      --kube-api-burst=100
      --worker-threads=100
    Limits:
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  40Mi
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-attacher:
    Image:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-attacher:v4.5.1-eks-1-30-2
    Port:       <none>
    Host Port:  <none>
    Args:
      --timeout=60s
      --csi-address=$(ADDRESS)
      --v=2
      --leader-election=true
      --kube-api-qps=20
      --kube-api-burst=100
      --worker-threads=100
    Limits:
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  40Mi
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-snapshotter:
    Image:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-snapshotter:v7.0.2-eks-1-30-2
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=$(ADDRESS)
      --leader-election=true
      --extra-create-metadata
      --kube-api-qps=20
      --kube-api-burst=100
      --worker-threads=100
    Limits:
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  40Mi
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   csi-resizer:
    Image:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/csi-resizer:v1.10.1-eks-1-30-2
    Port:       <none>
    Host Port:  <none>
    Args:
      --timeout=60s
      --csi-address=$(ADDRESS)
      --v=2
      --handle-volume-inuse-error=false
      --leader-election=true
      --kube-api-qps=20
      --kube-api-burst=100
      --workers=100
    Limits:
      memory:  256Mi
    Requests:
      cpu:     10m
      memory:  40Mi
    Environment:
      ADDRESS:  /var/lib/csi/sockets/pluginproxy/csi.sock
    Mounts:
      /var/lib/csi/sockets/pluginproxy/ from socket-dir (rw)
   liveness-probe:
    Image:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/livenessprobe:v2.12.0-eks-1-30-2
    Port:       <none>
    Host Port:  <none>
    Args:
      --csi-address=/csi/csi.sock
    Limits:
      memory:  256Mi
    Requests:
      cpu:        10m
      memory:     40Mi
    Environment:  <none>
    Mounts:
      /csi from socket-dir (rw)
  Volumes:
   socket-dir:
    Type:               EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:             
    SizeLimit:          <unset>
  Priority Class Name:  system-cluster-critical
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age                   From                   Message
  ----     ------        ----                  ----                   -------
  Warning  FailedCreate  4m24s (x18 over 22m)  replicaset-controller  Error creating: Timeout: request did not complete within requested timeout - context deadline exceeded

Environment

Kubernetes version (use kubectl version): 1.28
Driver version: v1.30.0-eksbuild.1

Answer 1 · 2024-05-22T15:49:32.000Z

/close

dupe of #2046

Answer 2 · 2024-05-22T15:49:36.000Z

@ConnorJC3: Closing this issue.

In response to this:

/close

dupe of #2046

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.