Yolean/kubernetes-kafka

zoo pod stuck in Pending

alexfrieden opened this issue · 8 comments

Hi folks,
I followed the instructions for standing up the k8s cluster with running things in the following order:

kubectl -n kafka apply -f configure/aws-storageclass-zookeeper-gp2.yml
kubectl -n kafka apply -f configure/aws-storageclass-broker-gp2.yml
kubectl -n kafka apply -f 00-namespace.yml
kubectl -n kafka apply -f rbac-namespace-default/
kubectl -n kafka apply -f zookeeper/
kubectl -n kafka apply -f kafka/

The logs seem fine but when I go to see the pods I get pod/zoo-0 and zoo-1 stuck in Pending

kubectl -n kafka get all
NAME          READY   STATUS    RESTARTS   AGE
pod/kafka-0   1/1     Running   0          4m
pod/kafka-1   1/1     Running   0          4m
pod/kafka-2   1/1     Running   0          4m
pod/pzoo-0    1/1     Running   0          6m
pod/pzoo-1    1/1     Running   0          6m
pod/pzoo-2    1/1     Running   0          6m
pod/zoo-0     0/1     Pending   0          6m
pod/zoo-1     0/1     Pending   0          6m

NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/bootstrap   ClusterIP   100.68.139.100   <none>        9092/TCP            4m
service/broker      ClusterIP   None             <none>        9092/TCP            4m
service/pzoo        ClusterIP   None             <none>        2888/TCP,3888/TCP   6m
service/zoo         ClusterIP   None             <none>        2888/TCP,3888/TCP   6m
service/zookeeper   ClusterIP   100.67.78.247    <none>        2181/TCP            6m

NAME                     DESIRED   CURRENT   AGE
statefulset.apps/kafka   3         3         4m
statefulset.apps/pzoo    3         3         6m
statefulset.apps/zoo     2         2         6m

Any thoughts on what is going on here? Any help is appreciated.

What is the output if you do kubectl -n kafka describe pod zoo-0? My initial thoughts is that you have run out of resources on your nodes.

@Jacobh2 looks like it can't find volume even though storage class exists.

kubectl -n kafka describe pod zoo-0
Name:           zoo-0
Namespace:      kafka
Node:           <none>
Labels:         app=zookeeper
                controller-revision-hash=zoo-7c5447d489
                statefulset.kubernetes.io/pod-name=zoo-0
                storage=persistent-regional
Annotations:    <none>
Status:         Pending
IP:
Controlled By:  StatefulSet/zoo
Init Containers:
  init-config:
    Image:      solsson/kafka-initutils@sha256:2cdb90ea514194d541c7b869ac15d2d530ca64889f56e270161fe4e5c3d076ea
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      /etc/kafka-configmap/init.sh
    Environment:
      ID_OFFSET:  4
    Mounts:
      /etc/kafka from config (rw)
      /etc/kafka-configmap from configmap (rw)
      /var/lib/zookeeper from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4fpbx (ro)
Containers:
  zookeeper:
    Image:       solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
    Ports:       2181/TCP, 2888/TCP, 3888/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      ./bin/zookeeper-server-start.sh
      /etc/kafka/zookeeper.properties
    Limits:
      memory:  120Mi
    Requests:
      cpu:      10m
      memory:   100Mi
    Readiness:  exec [/bin/sh -c [ "imok" = "$(echo ruok | nc -w 1 -q 1 127.0.0.1 2181)" ]] delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      KAFKA_LOG4J_OPTS:  -Dlog4j.configuration=file:/etc/kafka/log4j.properties
    Mounts:
      /etc/kafka from config (rw)
      /var/lib/zookeeper from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4fpbx (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-zoo-0
    ReadOnly:   false
  configmap:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      zookeeper-config
    Optional:  false
  config:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  default-token-4fpbx:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4fpbx
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  2m4s (x3914 over 19h)  default-scheduler  pod has unbound PersistentVolumeClaims (repeated 2 times)

I have the same problem. You'll notice the zoo-0 and zoo-1 pods make a claim for the kafka-zookeeper-regional storage, but a configuration file doesn't exist for it for AWS. Not sure what the recommendation here is though.

Storage classes were always meant to be custom. The stuff in https://github.com/Yolean/kubernetes-kafka/tree/master/configure is basically just examples. With regional volumes GKE clusters will have to adapt the examples as well.

It is of course optional to have zoo PVs span multiple availability zones. For example your cluster might be in a single zone, or you're fine with restricting all zookeeper pods to the zone of their respective volumes.

@solsson what is the recommendation? To add availability zones?

There is no recommendation :) There is only examples. You need to make the trade-offs, cost/availability etc. The zookeeper readme refers to some background, but I see now that it's from before #191.

Thanks @solsson I guess I am still a little unclear as to what the "pod has unbound PersistentVolumeClaims" is referring to. Storage class names I have changed to be "kafka-zookeeper"

@solsson weird, I deleted everything in the namespace, deleted all storage classes and it seems to be running now (at least containers are started and everything is in running state).