naumannt/tf-article

Worker nodes not joining the Cluster

Closed this issue · 1 comments

Hello !

Good day! I'm having hard time getting this repo working :-( I tried to replicate almost every change in codes needed for it to work with the latest version of EKS and terraform but the EKS coredns and worker node pods are not successful.

└─($:~)─- kubectl get nodes
NAME                                       STATUS     ROLES    AGE   VERSION
ip-10-0-20-65.us-west-2.compute.internal   NotReady   <none>   67m   v1.17.9-eks-4c6976
ip-10-0-21-87.us-west-2.compute.internal   NotReady   <none>   65m   v1.17.9-eks-4c6976
─- kubectl get pods -n kube-system
NAME                      READY   STATUS    RESTARTS   AGE
aws-node-dpk6d             0/1     Running   2          7m18s
aws-node-mb787             0/1     Running   8          7m17s
coredns-5c97f79574-l2klt   0/1     Pending   0          12m
coredns-5c97f79574-l6h5t   0/1     Pending   0          12m
kube-proxy-2cnpj           1/1     Running   0          7m18s
kube-proxy-wbb5w           1/1     Running   0          7m17s

─- kubectl get pods -n kube-system -o wide
NAME                       READY   STATUS    RESTARTS   AGE     IP            NODE                                        NOMINATED NODE   READINESS GATES
aws-node-dpk6d             0/1     Running   2          7m50s   10.0.21.101   ip-10-0-21-101.us-west-2.compute.internal   <none>           <none>
aws-node-mb787             0/1     Running   8          7m49s   10.0.20.226   ip-10-0-20-226.us-west-2.compute.internal   <none>           <none>
coredns-5c97f79574-l2klt   0/1     Pending   0          13m     <none>        <none>                                      <none>           <none>
coredns-5c97f79574-l6h5t   0/1     Pending   0          13m     <none>        <none>                                      <none>           <none>
kube-proxy-2cnpj           1/1     Running   0          7m50s   10.0.21.101   ip-10-0-21-101.us-west-2.compute.internal   <none>           <none>
kube-proxy-wbb5w           1/1     Running   0          7m49s   10.0.20.226   ip-10-0-20-226.us-west-2.compute.internal   <none>           <none>
─- kubectl describe pod coredns-65f879d6cd-gqmzv -n kube-system

Requests:
  cpu:        100m
  memory:     70Mi
Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness:    http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:  <none>
----
└─($:~/tf-article/Article 4|master)─- kubectl logs coredns-75f97d857-ktrmx -n kube-system Error from server: Get https://10.0.20.65:10250/containerLogs/kube-system/coredns-75f97d857-ktrmx/coredns: dial tcp 10.0.20.65:10250: i/o timeout

└─($:~/tf-article/Article 4|master)─- kubectl logs aws-node-lvhfv -n kube-system Error from server: Get https://10.0.20.65:10250/containerLogs/kube-system/aws-node-lvhfv/aws-node: dial tcp 10.0.20.65:10250: i/o timeout

Have you had this tested end to end while you checking-in? I guess the problem is due to some misconfiguration in the pod network. I'm still trying to debug. Any thoughts on this would be appreciated.

thanks!

That appears to be an issue with my AWS Account, plus a version compatibility issue between the latest version of EKS worker with kubelet (1.17) and little outdated control plane API server version (1.16).

Response from AWS support:

As discussed, since there is a hold in the region Oregon is the reason you are facing an issue with regards to launching or connecting any instance.
Not to worry, I have reached out to my team on this and once an update has been received I shall circle back to you.

The compatibility issue is resolved by updating the control plane version from EKS console.

thanks!