Readiness probe failed for kafka
selkabli opened this issue · 10 comments
Hi,
this is my first time using kafka so maybe i'm messing somthing can you please help
NAME READY STATUS RESTARTS AGE
pod/kafka-0 1/1 Running 0 50m
pod/kafka-1 1/1 Running 0 50m
pod/kafka-2 0/1 CrashLoopBackOff 6 12m
pod/pzoo-0 1/1 Running 0 57m
pod/pzoo-1 1/1 Running 0 57m
pod/pzoo-2 1/1 Running 0 57m
pod/zoo-0 1/1 Running 0 56m
pod/zoo-1 1/1 Running 0 56m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/bootstrap ClusterIP 10.233.9.140 <none> 9092/TCP 51m
service/broker ClusterIP None <none> 9092/TCP 52m
service/pzoo ClusterIP None <none> 2888/TCP,3888/TCP 59m
service/zoo ClusterIP None <none> 2888/TCP,3888/TCP 58m
service/zookeeper ClusterIP 10.233.35.111 <none> 2181/TCP 58m
NAME READY AGE
statefulset.apps/kafka 2/3 50m
statefulset.apps/pzoo 3/3 57m
statefulset.apps/zoo 2/2 56m ```
```[root@node1 ~]# kubectl get events -n kafka | grep Warn |grep pod/kafka-2
45m Warning Unhealthy pod/kafka-2 Readiness probe failed: dial tcp 10.233.90.28:9092: connect: connection refused
41m Warning BackOff pod/kafka-2 Back-off restarting failed container
32m Warning Unhealthy pod/kafka-2 Readiness probe failed: dial tcp 10.233.90.29:9092: connect: connection refused
17m Warning BackOff pod/kafka-2 Back-off restarting failed container
7m50s Warning Unhealthy pod/kafka-2 Readiness probe failed: dial tcp 10.233.90.30:9092: connect: connection refused
2m50s Warning BackOff pod/kafka-2 Back-off restarting failed container
[root@node1 ~]# kubectl logs kafka-2 -n kafka
[2019-06-21 23:51:06,385] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-06-21 23:51:07,197] INFO starting (kafka.server.KafkaServer)
[2019-06-21 23:51:07,198] INFO Connecting to zookeeper on zookeeper:2181 (kafka.server.KafkaServer)
[2019-06-21 23:51:07,226] INFO [ZooKeeperClient] Initializing a new session to zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:07,232] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:host.name=kafka-2.broker.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.version=11.0.2 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.home=/usr/lib/jvm/jdk-11 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/*:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-2.2.1.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka/bin/../libs/connect-file-2.2.1.jar:/opt/kafka/bin/../libs/connect-json-2.2.1.jar:/opt/kafka/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka/bin/../libs/scala-library-2.12.8.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.11.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.8-1.jar (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.version=3.10.0-957.12.1.el7.x86_64 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.dir=/opt/kafka (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,234] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@561868a0 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,251] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:13,254] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:27,265] INFO Opening socket connection to server zookeeper:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-21 23:51:27,377] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:27,380] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2019-06-21 23:51:27,382] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:27,387] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:96)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:361)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:385)
at kafka.server.KafkaServer.startup(KafkaServer.scala:205)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2019-06-21 23:51:27,390] INFO shutting down (kafka.server.KafkaServer)
[2019-06-21 23:51:27,403] INFO shut down completed (kafka.server.KafkaServer)
[2019-06-21 23:51:27,404] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-06-21 23:51:27,407] INFO shutting down (kafka.server.KafkaServer)
Looks like two kafka pods succeed and one fails. It could be 463e1c7 though that would be strange because there are 5 zookeeper pods to reach for 3 kafka brokers. Does everything but kafka-2 stay ready or is there other events? Do zookeeper services have the expected endpoints?
Please use ``` when you post command ouput. Makes it a lot more readable. See https://guides.github.com/features/mastering-markdown/
i changed zookeeper config to maxClientCnxns=2
but te same issue still persiste
[root@node1 ~]# kubectl get svc -n kafka
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
bootstrap ClusterIP 10.233.9.140 <none> 9092/TCP 13h
broker ClusterIP None <none> 9092/TCP 13h
pzoo ClusterIP None <none> 2888/TCP,3888/TCP 13h
zoo ClusterIP None <none> 2888/TCP,3888/TCP 13h
zookeeper ClusterIP 10.233.35.111 <none> 2181/TCP 13h
[root@node1 ~]# kubectl describe svc zookeeper -n kafka
Name: zookeeper
Namespace: kafka
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"zookeeper","namespace":"kafka"},"spec":{"ports":[{"name":"client"...
Selector: app=zookeeper
Type: ClusterIP
IP: 10.233.35.111
Port: client 2181/TCP
TargetPort: 2181/TCP
Endpoints: 10.233.90.24:2181,10.233.90.26:2181,10.233.92.33:2181 + 2 more...
Session Affinity: None
Events: <none>
[root@node1 ~]# kubectl get pods -n kafka -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kafka-0 1/1 Running 0 13h 10.233.92.35 node3 <none> <none>
kafka-1 1/1 Running 0 13h 10.233.96.34 node2 <none> <none>
kafka-2 0/1 CrashLoopBackOff 14 52m 10.233.90.31 node1 <none> <none>
pzoo-0 1/1 Running 0 13h 10.233.96.30 node2 <none> <none>
pzoo-1 1/1 Running 1 13h 10.233.92.33 node3 <none> <none>
pzoo-2 1/1 Running 1 13h 10.233.90.24 node1 <none> <none>
zoo-0 1/1 Running 0 13h 10.233.96.32 node2 <none> <none>
zoo-1 1/1 Running 1 13h 10.233.90.26 node1 <none> <none>
I'm puzzled. At this point I can't come up with a single hypothesis to test. Something might come to mind later, but my only advice now is to dig around and do different experiments that involve killing pods.
Edit: zookeeper logs could possibly provide clues.
@solsson I also reported the same error.When I modify kafka and zk namespace Other namespace 。initing kafka init-config reported error:
- cp /etc/kafka-configmap/log4j.properties /etc/kafka/
- KAFKA_BROKER_ID=2
- SEDS=("s/#init#broker.id=#init#/broker.id=$KAFKA_BROKER_ID/")
- LABELS=kafka-broker-id=2
- ANNOTATIONS=
- hash kubectl
++ kubectl get node metrosecurity-2 '-o=go-template={{index .metadata.labels "failure-domain.beta.kubernetes.io/zone"}}'
Error from server (Forbidden): nodes "metrosecurity-2" is forbidden: User "system:serviceaccount:zhihuiaj:default" cannot get resource "nodes" in API group "" at the cluster scope - ZONE=
if The namespace is kafka, the cluster init is normal and the connection to zk is normal.But this is not what I want, my project is in other namespaces。
so kafka namespace is kafka,zk is other namesapce。To solve the problem across namespace, I created a service in namespace kafka:
apiVersion: v1
kind: Service
metadata:
name: kafka-zk-port2
namespace: kafka
spec:
ports:
- name: kafka-port2
port: 2181
protocol: TCP
targetPort: 2181
sessionAffinity: None
type: ExternalName
externalName: zk-cli
then,reported The above error:
[2019-06-26 05:52:11,975] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-06-26 05:52:12,472] INFO starting (kafka.server.KafkaServer)
[2019-06-26 05:52:12,472] INFO Connecting to zookeeper on zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181 (kafka.server.KafkaServer)
[2019-06-26 05:52:12,492] INFO [ZooKeeperClient] Initializing a new session to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:12,497] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:host.name=kafka-0.kafka-cluster.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.version=11.0.2 (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.home=/usr/lib/jvm/jdk-11 (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/*:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-2.2.1.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka/bin/../libs/connect-file-2.2.1.jar:/opt/kafka/bin/../libs/connect-json-2.2.1.jar:/opt/kafka/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka/bin/../libs/scala-library-2.12.8.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.11.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.8-1.jar (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:java.compiler= (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:os.version=5.1.9-050109-generic (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,497] INFO Client environment:user.dir=/opt/kafka (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,498] INFO Initiating client connection, connectString=zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@6138e79a (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:12,509] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:12,517] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:12,524] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:12,527] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:13,251] INFO Opening socket connection to server zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:13,252] INFO Socket connection established to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:13,252] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:14,058] INFO Opening socket connection to server zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:14,059] INFO Socket connection established to zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:14,059] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:15,985] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:15,985] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:15,986] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:16,766] INFO Opening socket connection to server zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:16,767] INFO Socket connection established to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:16,768] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:17,208] INFO Opening socket connection to server zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:17,209] INFO Socket connection established to zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:17,210] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,384] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,384] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,385] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:18,512] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:19,265] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2019-06-26 05:52:19,269] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2019-06-26 05:52:19,270] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient)
[2019-06-26 05:52:19,278] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:96)
at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:361)
at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:385)
at kafka.server.KafkaServer.startup(KafkaServer.scala:205)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:75)
at kafka.Kafka.main(Kafka.scala)
[2019-06-26 05:52:19,281] INFO shutting down (kafka.server.KafkaServer)
[2019-06-26 05:52:19,291] INFO shut down completed (kafka.server.KafkaServer)
[2019-06-26 05:52:19,291] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-06-26 05:52:19,294] INFO shutting down (kafka.server.KafkaServer)
@amateu It looks like yours is a custom setup with ExternalName for zookeeper. Why don't you edit zookeeper.connect
in Kafka's config instead? In addition you seem to have quite specific RBAC in your cluster and you probably need to customize the RBAC resources.
With @selkabli's issue what is most interesting is that only kafka-2 fails. I think in your setup @amateu all brokers will fail.
@solsson ,yes,it's all brokers will fail.The reason is really caused by rbac, I tried to create a rbac on my project to deploy zk and kafka instead of namespace is kafka. But still the connection zk timeout。
So, I deployed zk and kafka in another clean test environment, not using rbac. But still the connection zk timeout. The same mistake as before. Finally, I changed the yml of zk. Zk and kafka clusters are normal。
I still can't find the specific reason for the previous problem.
With @selkabli's issue,I think he might have used hostNetwork: true
@solsson the problem happen only on node1 whish is the master of my cluster any clues why ?
the taint is already removed from master so it's not related to taint
That's an important observation. I haven't tried running on a mastter. I have no clue why the zookeeper connection would fail from there.
having the same issue as @selkabli, I am deploying on bear-metal k8s cluster with local persistent volume. 1 broker (out of 3) always failed to start correctly.
nvm, seems the pv on one of the node having problem which cause this. I changed the pv to another node, it works fine.