Worker nodes not joining the Cluster
Closed this issue · 1 comments
Hello !
Good day! I'm having hard time getting this repo working :-( I tried to replicate almost every change in codes needed for it to work with the latest version of EKS and terraform but the EKS coredns and worker node pods are not successful.
└─($:~)─- kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-20-65.us-west-2.compute.internal NotReady <none> 67m v1.17.9-eks-4c6976
ip-10-0-21-87.us-west-2.compute.internal NotReady <none> 65m v1.17.9-eks-4c6976
─- kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-dpk6d 0/1 Running 2 7m18s
aws-node-mb787 0/1 Running 8 7m17s
coredns-5c97f79574-l2klt 0/1 Pending 0 12m
coredns-5c97f79574-l6h5t 0/1 Pending 0 12m
kube-proxy-2cnpj 1/1 Running 0 7m18s
kube-proxy-wbb5w 1/1 Running 0 7m17s
─- kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
aws-node-dpk6d 0/1 Running 2 7m50s 10.0.21.101 ip-10-0-21-101.us-west-2.compute.internal <none> <none>
aws-node-mb787 0/1 Running 8 7m49s 10.0.20.226 ip-10-0-20-226.us-west-2.compute.internal <none> <none>
coredns-5c97f79574-l2klt 0/1 Pending 0 13m <none> <none> <none> <none>
coredns-5c97f79574-l6h5t 0/1 Pending 0 13m <none> <none> <none> <none>
kube-proxy-2cnpj 1/1 Running 0 7m50s 10.0.21.101 ip-10-0-21-101.us-west-2.compute.internal <none> <none>
kube-proxy-wbb5w 1/1 Running 0 7m49s 10.0.20.226 ip-10-0-20-226.us-west-2.compute.internal <none> <none>
─- kubectl describe pod coredns-65f879d6cd-gqmzv -n kube-system
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
----
└─($:~/tf-article/Article 4|master)─- kubectl logs coredns-75f97d857-ktrmx -n kube-system Error from server: Get https://10.0.20.65:10250/containerLogs/kube-system/coredns-75f97d857-ktrmx/coredns: dial tcp 10.0.20.65:10250: i/o timeout
└─($:~/tf-article/Article 4|master)─- kubectl logs aws-node-lvhfv -n kube-system Error from server: Get https://10.0.20.65:10250/containerLogs/kube-system/aws-node-lvhfv/aws-node: dial tcp 10.0.20.65:10250: i/o timeout
Have you had this tested end to end while you checking-in? I guess the problem is due to some misconfiguration in the pod network. I'm still trying to debug. Any thoughts on this would be appreciated.
thanks!
That appears to be an issue with my AWS Account, plus a version compatibility issue between the latest version of EKS worker with kubelet (1.17) and little outdated control plane API server version (1.16).
Response from AWS support:
As discussed, since there is a hold in the region Oregon is the reason you are facing an issue with regards to launching or connecting any instance.
Not to worry, I have reached out to my team on this and once an update has been received I shall circle back to you.
The compatibility issue is resolved by updating the control plane version from EKS console.
thanks!