Yolean/kubernetes-kafka

Kafka svc must be bypassed when topics are not replicated to all instances

solsson opened this issue · 3 comments

We need to investigate how to use the kafka k8s service when topics have fewer replicas than there are kafka instances.

Background: we've noticed some oddities using the no-kafka client lib, connecting to kafka:9092, if a kafka instance is lost and the topic isn't replicated to all instances. The client sometimes seems to react as if the topic does not exist.

In 0.10.1 the official java client seems to deprecate other means of connection than bootstrap.servers. This implies awareness of the FQDN or IP of each broker. A kubernetes service, on the other hand, is basically a round-robin proxy.

https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol says: "In other words, the client needs to somehow find one broker and that broker will tell the client about all the other brokers that exist and what partitions they host. This first broker may itself go down so the best practice for a client implementation is to take a list of two or three urls to bootstrap from. The user can then choose to use a load balancer or just statically configure two or three of their kafka hosts in the clients."

In other words, no-kafka probably performs the correct operations, but expects us to connect using a string kafka-0:9092,kafka-1:9092,.... Hence, the standard kubernetes service fails

We can get the actual name that each broker thinks it has, from the logs. See https://github.com/Yolean/kubernetes-kafka/blob/v1.0.0/README.md#start-kafka.

In minikube I get entries like kafka-0.broker.kafka.svc.cluster.local,9092,ListenerName(PLAINTEXT) so I think our bootstrap should be kafka-0.broker.kafka.svc.cluster.local:9092,kafka-1.broker.kafka.svc.cluster.local:9092,kafka-2.broker.kafka.svc.cluster.local:9092. It will likely be sufficient if we scale up too.

The shell scripts in the kafka image (0.10.2.0) give a quite confusing picture:

  • ./bin/kafka-topics.sh says --zookeeper is REQUIRED
  • ./bin/kafka-console-consumer.sh says --bootstrap-server AND --zookeeper is REQUIRED
  • ./bin/kafka-console-producer.sh has neither of the above, but says --broker-list is REQUIRED

And when trying to produce messages I get errors like org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.