Yolean/kubernetes-kafka

Readiness probe failed for kafka

selkabli opened this issue · 10 comments

Hi,
this is my first time using kafka so maybe i'm messing somthing can you please help

NAME          READY   STATUS             RESTARTS   AGE
pod/kafka-0   1/1     Running            0          50m
pod/kafka-1   1/1     Running            0          50m
pod/kafka-2   0/1     CrashLoopBackOff   6          12m
pod/pzoo-0    1/1     Running            0          57m
pod/pzoo-1    1/1     Running            0          57m
pod/pzoo-2    1/1     Running            0          57m
pod/zoo-0     1/1     Running            0          56m
pod/zoo-1     1/1     Running            0          56m

NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/bootstrap   ClusterIP   10.233.9.140    <none>        9092/TCP            51m
service/broker      ClusterIP   None            <none>        9092/TCP            52m
service/pzoo        ClusterIP   None            <none>        2888/TCP,3888/TCP   59m
service/zoo         ClusterIP   None            <none>        2888/TCP,3888/TCP   58m
service/zookeeper   ClusterIP   10.233.35.111   <none>        2181/TCP            58m

NAME                     READY   AGE
statefulset.apps/kafka   2/3     50m
statefulset.apps/pzoo    3/3     57m
statefulset.apps/zoo     2/2     56m ```
```[root@node1 ~]#  kubectl get events -n kafka | grep Warn |grep pod/kafka-2
45m         Warning   Unhealthy               pod/kafka-2                          Readiness probe failed: dial tcp 10.233.90.28:9092: connect: connection refused
41m         Warning   BackOff                 pod/kafka-2                          Back-off restarting failed container
32m         Warning   Unhealthy               pod/kafka-2                          Readiness probe failed: dial tcp 10.233.90.29:9092: connect: connection refused
17m         Warning   BackOff                 pod/kafka-2                          Back-off restarting failed container
7m50s       Warning   Unhealthy               pod/kafka-2                          Readiness probe failed: dial tcp 10.233.90.30:9092: connect: connection refused
2m50s       Warning   BackOff                 pod/kafka-2                          Back-off restarting failed container
[root@node1 ~]# kubectl logs kafka-2 -n kafka
[2019-06-21 23:51:06,385] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-06-21 23:51:07,197] INFO starting (kafka.server.KafkaServer)
[2019-06-21 23:51:07,198] INFO Connecting to zookeeper on zookeeper:2181 (kafka.server.KafkaServer)
[2019-06-21 23:51:07,226] INFO [ZooKeeperClient] Initializing a new session to zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:07,232] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:host.name=kafka-2.broker.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.version=11.0.2 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.home=/usr/lib/jvm/jdk-11 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/*:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-2.2.1.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka/bin/../libs/connect-file-2.2.1.jar:/opt/kafka/bin/../libs/connect-json-2.2.1.jar:/opt/kafka/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka/bin/../libs/scala-library-2.12.8.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.11.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.8-1.jar (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.version=3.10.0-957.12.1.el7.x86_64 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.dir=/opt/kafka (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,234] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@561868a0 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,251] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:13,254] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:27,265] INFO Opening socket connection to server zookeeper:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-21 23:51:27,377] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:27,380] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2019-06-21 23:51:27,382] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:27,387] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
        at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
        at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
        at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:96)
        at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
        at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:361)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:385)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:205)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
        at kafka.Kafka$.main(Kafka.scala:75)
        at kafka.Kafka.main(Kafka.scala)
[2019-06-21 23:51:27,390] INFO shutting down (kafka.server.KafkaServer)
[2019-06-21 23:51:27,403] INFO shut down completed (kafka.server.KafkaServer)
[2019-06-21 23:51:27,404] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-06-21 23:51:27,407] INFO shutting down (kafka.server.KafkaServer)

Looks like two kafka pods succeed and one fails. It could be 463e1c7 though that would be strange because there are 5 zookeeper pods to reach for 3 kafka brokers. Does everything but kafka-2 stay ready or is there other events? Do zookeeper services have the expected endpoints?

Please use ``` when you post command ouput. Makes it a lot more readable. See https://guides.github.com/features/mastering-markdown/

i changed zookeeper config to maxClientCnxns=2 but te same issue still persiste

[root@node1 ~]# kubectl get svc -n kafka
NAME        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
bootstrap   ClusterIP   10.233.9.140    <none>        9092/TCP            13h
broker      ClusterIP   None            <none>        9092/TCP            13h
pzoo        ClusterIP   None            <none>        2888/TCP,3888/TCP   13h
zoo         ClusterIP   None            <none>        2888/TCP,3888/TCP   13h
zookeeper   ClusterIP   10.233.35.111   <none>        2181/TCP            13h
[root@node1 ~]# kubectl describe svc zookeeper -n kafka
Name:              zookeeper
Namespace:         kafka
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"zookeeper","namespace":"kafka"},"spec":{"ports":[{"name":"client"...
Selector:          app=zookeeper
Type:              ClusterIP
IP:                10.233.35.111
Port:              client  2181/TCP
TargetPort:        2181/TCP
Endpoints:         10.233.90.24:2181,10.233.90.26:2181,10.233.92.33:2181 + 2 more...
Session Affinity:  None
Events:            <none>
[root@node1 ~]# kubectl get pods -n kafka -o wide
NAME      READY   STATUS             RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
kafka-0   1/1     Running            0          13h   10.233.92.35   node3   <none>           <none>
kafka-1   1/1     Running            0          13h   10.233.96.34   node2   <none>           <none>
kafka-2   0/1     CrashLoopBackOff   14         52m   10.233.90.31   node1   <none>           <none>
pzoo-0    1/1     Running            0          13h   10.233.96.30   node2   <none>           <none>
pzoo-1    1/1     Running            1          13h   10.233.92.33   node3   <none>           <none>
pzoo-2    1/1     Running            1          13h   10.233.90.24   node1   <none>           <none>
zoo-0     1/1     Running            0          13h   10.233.96.32   node2   <none>           <none>
zoo-1     1/1     Running            1          13h   10.233.90.26   node1   <none>           <none>

I'm puzzled. At this point I can't come up with a single hypothesis to test. Something might come to mind later, but my only advice now is to dig around and do different experiments that involve killing pods.

Edit: zookeeper logs could possibly provide clues.

@solsson I also reported the same error.When I modify kafka and zk namespace Other namespace 。initing kafka init-config reported error:

  • cp /etc/kafka-configmap/log4j.properties /etc/kafka/
  • KAFKA_BROKER_ID=2
  • SEDS=("s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/")
  • LABELS=kafka-broker-id=2
  • ANNOTATIONS=
  • hash kubectl
    ++ kubectl get node metrosecurity-2 '-o=go-template={{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}'
    Error from server (Forbidden): nodes "metrosecurity-2" is forbidden: User "system:serviceaccount:zhihuiaj:default" cannot get resource "nodes" in API group "" at the cluster scope
  • ZONE=

if The namespace is kafka, the cluster init is normal and the connection to zk is normal.But this is not what I want, my project is in other namespaces。
so kafka namespace is kafka,zk is other namesapce。To solve the problem across namespace, I created a service in namespace kafka:
apiVersion: v1
kind: Service
metadata:
name: kafka-zk-port2
namespace: kafka
spec:
ports:

  • name: kafka-port2
    port: 2181
    protocol: TCP
    targetPort: 2181
    sessionAffinity: None
    type: ExternalName
    externalName: zk-cli

then,reported The above error:

[2019-06-26 05:52:11,975] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-06-26 05:52:12,472] INFO starting (kafka.server.KafkaServer)
[2019-06-26 05:52:12,472] INFO Connecting to zookeeper on zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181 (kafka.server.KafkaServer)
[2019-06-26 05:52:12,492] INFO [ZooKeeperClient] Initializing a new session to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:12,497] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:host.name=kafka-0.kafka-cluster.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.version=11.0.2 (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.home=/usr/lib/jvm/jdk-11 (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/*:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-2.2.1.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka/bin/../libs/connect-file-2.2.1.jar:/opt/kafka/bin/../libs/connect-json-2.2.1.jar:/opt/kafka/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka/bin/../libs/scala-library-2.12.8.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.11.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.8-1.jar (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.compiler= (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:os.version=5.1.9-050109-generic (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:user.dir=/opt/kafka (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,498] INFO Initiating client connection, connectString=zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@6138e79a (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,509] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:12,517] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:12,524] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:12,527] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:13,251] INFO Opening socket connection to server zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:13,252] INFO Socket connection established to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:13,252] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:14,058] INFO Opening socket connection to server zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:14,059] INFO Socket connection established to zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:14,059] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:15,985] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:15,985] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:15,986] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:16,766] INFO Opening socket connection to server zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:16,767] INFO Socket connection established to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:16,768] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:17,208] INFO Opening socket connection to server zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:17,209] INFO Socket connection established to zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:17,210] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,384] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,384] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,385] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,512] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:19,265] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:19,269] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:19,270] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:19,278] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:96)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:361)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:385)
at kafka.server.KafkaServer.startup(KafkaServer.scala:205)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2019-06-26 05:52:19,281] INFO shutting down (kafka.server.KafkaServer)
[2019-06-26 05:52:19,291] INFO shut down completed (kafka.server.KafkaServer)
[2019-06-26 05:52:19,291] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-06-26 05:52:19,294] INFO shutting down (kafka.server.KafkaServer)

@amateu It looks like yours is a custom setup with ExternalName for zookeeper. Why don't you edit zookeeper.connect in Kafka's config instead? In addition you seem to have quite specific RBAC in your cluster and you probably need to customize the RBAC resources.

With @selkabli's issue what is most interesting is that only kafka-2 fails. I think in your setup @amateu all brokers will fail.

@solsson ,yes,it's all brokers will fail.The reason is really caused by rbac, I tried to create a rbac on my project to deploy zk and kafka instead of namespace is kafka. But still the connection zk timeout。
So, I deployed zk and kafka in another clean test environment, not using rbac. But still the connection zk timeout. The same mistake as before. Finally, I changed the yml of zk. Zk and kafka clusters are normal。
I still can't find the specific reason for the previous problem.
With @selkabli's issue,I think he might have used hostNetwork: true

@solsson the problem happen only on node1 whish is the master of my cluster any clues why ?

the taint is already removed from master so it's not related to taint

That's an important observation. I haven't tried running on a mastter. I have no clue why the zookeeper connection would fail from there.

having the same issue as @selkabli, I am deploying on bear-metal k8s cluster with local persistent volume. 1 broker (out of 3) always failed to start correctly.

nvm, seems the pv on one of the node having problem which cause this. I changed the pv to another node, it works fine.