prabhatsharma/kubernetes-the-hard-way-aws

Waiting for services and endpoints to be initialized from apiserver

itsmetommy opened this issue · 3 comments

I'm having a problem getting DNS to work. It looks like the apiserver times out, but I can't figure out why. Any help is appreciated.

kubectl get pods -l k8s-app=kube-dns -n kube-system
NAME                        READY     STATUS             RESTARTS   AGE
kube-dns-864b8bdc77-8gntk   2/3       CrashLoopBackOff   5          5m
kubectl logs kube-dns-864b8bdc77-8gntk kubedns -n kube-system
...
I0210 21:32:34.232623       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0210 21:32:34.732601       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0210 21:32:35.232515       1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0210 21:32:35.232914       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.32.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
E0210 21:32:35.233133       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.32.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.32.0.1:443: i/o timeout
...
kubectl describe pod kube-dns-864b8bdc77-8gntk -n kube-system
Name:               kube-dns-864b8bdc77-8gntk
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               ip-10-240-0-22/10.240.0.22
Start Time:         Sun, 10 Feb 2019 13:32:00 -0800
Labels:             k8s-app=kube-dns
                    pod-template-hash=4206468733
Annotations:        scheduler.alpha.kubernetes.io/critical-pod=
Status:             Running
IP:                 10.200.2.2
Controlled By:      ReplicaSet/kube-dns-864b8bdc77
Containers:
  kubedns:
    Container ID:  containerd://774ee159af06de9ada6c66965a73a9fc9560487fc6b1bdbc72acafdd75954fb3
    Image:         gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7
    Image ID:      gcr.io/google_containers/k8s-dns-kube-dns-amd64@sha256:f5bddc71efe905f4e4b96f3ca346414be6d733610c1525b98fff808f93966680
    Ports:         10053/UDP, 10053/TCP, 10055/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      --domain=cluster.local.
      --dns-port=10053
      --config-dir=/kube-dns-config
      --v=2
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    255
      Started:      Sun, 10 Feb 2019 13:37:35 -0800
      Finished:     Sun, 10 Feb 2019 13:38:35 -0800
    Ready:          False
    Restart Count:  4
    Limits:
      memory:  170Mi
    Requests:
      cpu:      100m
      memory:   70Mi
    Liveness:   http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:  http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
    Environment:
      PROMETHEUS_PORT:  10055
    Mounts:
      /kube-dns-config from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-7lzvn (ro)
  dnsmasq:
    Container ID:  containerd://a4630ae32325ce609abd26489e5f014bb96da133235c1b1efe5e484bd3f6e6ed
    Image:         gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
    Image ID:      gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64@sha256:6cfb9f9c2756979013dbd3074e852c2d8ac99652570c5d17d152e0c0eb3321d6
    Ports:         53/UDP, 53/TCP
    Host Ports:    0/UDP, 0/TCP
    Args:
      -v=2
      -logtostderr
      -configDir=/etc/k8s/dns/dnsmasq-nanny
      -restartDnsmasq=true
      --
      -k
      --cache-size=1000
      --no-negcache
      --log-facility=-
      --server=/cluster.local/127.0.0.1#10053
      --server=/in-addr.arpa/127.0.0.1#10053
      --server=/ip6.arpa/127.0.0.1#10053
    State:          Running
      Started:      Sun, 10 Feb 2019 13:38:38 -0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sun, 10 Feb 2019 13:36:27 -0800
      Finished:     Sun, 10 Feb 2019 13:38:37 -0800
    Ready:          True
    Restart Count:  3
    Requests:
      cpu:        150m
      memory:     20Mi
    Liveness:     http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-7lzvn (ro)
  sidecar:
    Container ID:  containerd://76121d4643f072450fa6daea8e582f39641ca41ef74045c59e04167e7ef22254
    Image:         gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
    Image ID:      gcr.io/google_containers/k8s-dns-sidecar-amd64@sha256:f80f5f9328107dc516d67f7b70054354b9367d31d4946a3bffd3383d83d7efe8
    Port:          10054/TCP
    Host Port:     0/TCP
    Args:
      --v=2
      --logtostderr
      --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
      --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
    State:          Running
      Started:      Sun, 10 Feb 2019 13:32:08 -0800
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     20Mi
    Liveness:     http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-7lzvn (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kube-dns-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-dns
    Optional:  true
  kube-dns-token-7lzvn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-dns-token-7lzvn
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age              From                     Message
  ----     ------     ----             ----                     -------
  Normal   Scheduled  6m               default-scheduler        Successfully assigned kube-system/kube-dns-864b8bdc77-8gntk to ip-10-240-0-22
  Normal   Pulling    6m               kubelet, ip-10-240-0-22  pulling image "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7"
  Normal   Pulled     6m               kubelet, ip-10-240-0-22  Successfully pulled image "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7"
  Normal   Pulling    6m               kubelet, ip-10-240-0-22  pulling image "gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7"
  Normal   Pulled     6m               kubelet, ip-10-240-0-22  Successfully pulled image "gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7"
  Normal   Created    6m               kubelet, ip-10-240-0-22  Created container
  Normal   Started    6m               kubelet, ip-10-240-0-22  Started container
  Normal   Pulling    6m               kubelet, ip-10-240-0-22  pulling image "gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7"
  Normal   Pulled     6m               kubelet, ip-10-240-0-22  Successfully pulled image "gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7"
  Normal   Created    6m               kubelet, ip-10-240-0-22  Created container
  Normal   Started    6m               kubelet, ip-10-240-0-22  Started container
  Normal   Started    5m (x2 over 6m)  kubelet, ip-10-240-0-22  Started container
  Normal   Created    5m (x2 over 6m)  kubelet, ip-10-240-0-22  Created container
  Normal   Pulled     5m               kubelet, ip-10-240-0-22  Container image "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7" already present on machine
  Warning  Unhealthy  5m (x2 over 5m)  kubelet, ip-10-240-0-22  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  Unhealthy  5m (x8 over 6m)  kubelet, ip-10-240-0-22  Readiness probe failed: Get http://10.200.2.2:8081/readiness: dial tcp 10.200.2.2:8081: connect: connection refused
  Warning  BackOff    1m (x6 over 3m)  kubelet, ip-10-240-0-22  Back-off restarting failed container
kubectl get svc kubernetes -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2019-02-10T21:14:46Z
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
  resourceVersion: "14"
  selfLink: /api/v1/namespaces/default/services/kubernetes
  uid: e4c3b4a9-2d78-11e9-aeb9-0662e484f590
spec:
  clusterIP: 10.32.0.1
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 6443
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

I had this same issue. I was able to get around it by using the CoreDNS config in the original Kubernetes The Hard Way:

kubectl apply -f https://storage.googleapis.com/kubernetes-the-hard-way/coredns.yaml

I was then unable to get the correct response from nslookup on the busybox pod though:

% kubectl exec -ti busybox -- nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10

nslookup: can't resolve 'kubernetes'
command terminated with exit code 1

I'm still investigating that issue.

It looks my issue was due to two route tables getting created in my VPC and associating the subnet and tagging the inactive route table. Once I moved the association and added the kubernetes tag to the correct route table, everything works as expected.

I had the exact same issue as @alexclarkofficial ...not sure how two different route tables got created, since in https://github.com/prabhatsharma/kubernetes-the-hard-way-aws/blob/master/docs/03-compute-resources.md it looks like only one is specified to be created. Maybe I accidentally ran the command twice with slightly different params or something.

Anyway, all good now!

kubectl exec -it busybox -- nslookup kubernetes
Server:    10.32.0.10
Address 1: 10.32.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes
Address 1: 10.32.0.1 kubernetes.default.svc.cluster.local