zoo pod stuck in Pending
alexfrieden opened this issue · 8 comments
Hi folks,
I followed the instructions for standing up the k8s cluster with running things in the following order:
kubectl -n kafka apply -f configure/aws-storageclass-zookeeper-gp2.yml
kubectl -n kafka apply -f configure/aws-storageclass-broker-gp2.yml
kubectl -n kafka apply -f 00-namespace.yml
kubectl -n kafka apply -f rbac-namespace-default/
kubectl -n kafka apply -f zookeeper/
kubectl -n kafka apply -f kafka/
The logs seem fine but when I go to see the pods I get pod/zoo-0 and zoo-1 stuck in Pending
kubectl -n kafka get all
NAME READY STATUS RESTARTS AGE
pod/kafka-0 1/1 Running 0 4m
pod/kafka-1 1/1 Running 0 4m
pod/kafka-2 1/1 Running 0 4m
pod/pzoo-0 1/1 Running 0 6m
pod/pzoo-1 1/1 Running 0 6m
pod/pzoo-2 1/1 Running 0 6m
pod/zoo-0 0/1 Pending 0 6m
pod/zoo-1 0/1 Pending 0 6m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/bootstrap ClusterIP 100.68.139.100 <none> 9092/TCP 4m
service/broker ClusterIP None <none> 9092/TCP 4m
service/pzoo ClusterIP None <none> 2888/TCP,3888/TCP 6m
service/zoo ClusterIP None <none> 2888/TCP,3888/TCP 6m
service/zookeeper ClusterIP 100.67.78.247 <none> 2181/TCP 6m
NAME DESIRED CURRENT AGE
statefulset.apps/kafka 3 3 4m
statefulset.apps/pzoo 3 3 6m
statefulset.apps/zoo 2 2 6m
Any thoughts on what is going on here? Any help is appreciated.
What is the output if you do kubectl -n kafka describe pod zoo-0
? My initial thoughts is that you have run out of resources on your nodes.
@Jacobh2 looks like it can't find volume even though storage class exists.
kubectl -n kafka describe pod zoo-0
Name: zoo-0
Namespace: kafka
Node: <none>
Labels: app=zookeeper
controller-revision-hash=zoo-7c5447d489
statefulset.kubernetes.io/pod-name=zoo-0
storage=persistent-regional
Annotations: <none>
Status: Pending
IP:
Controlled By: StatefulSet/zoo
Init Containers:
init-config:
Image: solsson/kafka-initutils@sha256:2cdb90ea514194d541c7b869ac15d2d530ca64889f56e270161fe4e5c3d076ea
Port: <none>
Host Port: <none>
Command:
/bin/bash
/etc/kafka-configmap/init.sh
Environment:
ID_OFFSET: 4
Mounts:
/etc/kafka from config (rw)
/etc/kafka-configmap from configmap (rw)
/var/lib/zookeeper from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4fpbx (ro)
Containers:
zookeeper:
Image: solsson/kafka:2.1.0@sha256:ac3f06d87d45c7be727863f31e79fbfdcb9c610b51ba9cf03c75a95d602f15e1
Ports: 2181/TCP, 2888/TCP, 3888/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
./bin/zookeeper-server-start.sh
/etc/kafka/zookeeper.properties
Limits:
memory: 120Mi
Requests:
cpu: 10m
memory: 100Mi
Readiness: exec [/bin/sh -c [ "imok" = "$(echo ruok | nc -w 1 -q 1 127.0.0.1 2181)" ]] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
KAFKA_LOG4J_OPTS: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
Mounts:
/etc/kafka from config (rw)
/var/lib/zookeeper from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4fpbx (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-zoo-0
ReadOnly: false
configmap:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: zookeeper-config
Optional: false
config:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-4fpbx:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4fpbx
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m4s (x3914 over 19h) default-scheduler pod has unbound PersistentVolumeClaims (repeated 2 times)
I have the same problem. You'll notice the zoo-0
and zoo-1
pods make a claim for the kafka-zookeeper-regional
storage, but a configuration file doesn't exist for it for AWS. Not sure what the recommendation here is though.
Storage classes were always meant to be custom. The stuff in https://github.com/Yolean/kubernetes-kafka/tree/master/configure is basically just examples. With regional volumes GKE clusters will have to adapt the examples as well.
It is of course optional to have zoo
PVs span multiple availability zones. For example your cluster might be in a single zone, or you're fine with restricting all zookeeper pods to the zone of their respective volumes.
@solsson what is the recommendation? To add availability zones?
There is no recommendation :) There is only examples. You need to make the trade-offs, cost/availability etc. The zookeeper readme refers to some background, but I see now that it's from before #191.
Thanks @solsson I guess I am still a little unclear as to what the "pod has unbound PersistentVolumeClaims" is referring to. Storage class names I have changed to be "kafka-zookeeper"
@solsson weird, I deleted everything in the namespace, deleted all storage classes and it seems to be running now (at least containers are started and everything is in running state).