Yolean/kubernetes-kafka

Kafka Fails when more than 1 zookeeper instances are up

Opened this issue · 4 comments

Kafka fails with below error when i have more than 1 replica of zookeeper up.
Currently I have zookeeper replica as 3 and kafka also 3.
Kafka keeps on going in crashloopbackoff. Error below
Note * Works fine when one one zookeeper replica is provided in connect. i.e. when only one zookeeper replica.

Please help.

[root@devtricorder69-master-01 ~]# kubectl get po -l comp-group=kafka
NAME          READY     STATUS             RESTARTS   AGE
kafka-0       0/1       CrashLoopBackOff   6          8m
zookeeper-0   1/1       Running            0          1h
zookeeper-1   1/1       Running            0          1h
zookeeper-2   1/1       Running            0          1h

[2018-02-16 12:32:43,864] INFO Socket connection established to 10.32.0.110/10.32.0.110:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2018-02-16 12:32:43,866] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
..

2018-02-16 12:32:49,027] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server '10.32.0.108:2181,10.32.0.110:2181,10.32.0.111:2181' with timeout of 9000 ms
	at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1233)
	at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:157)
	at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:131)
	at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:106)
	at kafka.utils.ZkUtils$.apply(ZkUtils.scala:88)
	at kafka.server.KafkaServer.initZk(KafkaServer.scala:326)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:187)
	at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:39)
	at kafka.Kafka$.main(Kafka.scala:67)
	at kafka.Kafka.main(Kafka.scala)
[2018-02-16 12:32:49,030] INFO shutting down (kafka.server.KafkaServer)

Maybe this is a generic zookeeper/kafka troubleshooting case? The zookeeper pod name is different from this repo. Is there a fork with your changes?

No , i do not have a fork, i kept zookeeper as name instead of zoo and both kafka zookeeper deployed in default namespace. can you give me any direction on how can i debug what can i check? as if i have 1 zookeeper kafka able to connect but when multiple it fails,

All I can say then is that I haven't seen an issue like that before. Naming changes may sound cosmetic, but there's a lot that depend on naming.

@csarora check how many client connections ZK is allowing, try setting this to 0 for troubleshooting