Kafka svc must be bypassed when topics are not replicated to all instances
solsson opened this issue · 3 comments
We need to investigate how to use the kafka
k8s service when topics have fewer replicas than there are kafka instances.
Background: we've noticed some oddities using the no-kafka client lib, connecting to kafka:9092
, if a kafka instance is lost and the topic isn't replicated to all instances. The client sometimes seems to react as if the topic does not exist.
In 0.10.1 the official java client seems to deprecate other means of connection than bootstrap.servers
. This implies awareness of the FQDN or IP of each broker. A kubernetes service, on the other hand, is basically a round-robin proxy.
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol says: "In other words, the client needs to somehow find one broker and that broker will tell the client about all the other brokers that exist and what partitions they host. This first broker may itself go down so the best practice for a client implementation is to take a list of two or three urls to bootstrap from. The user can then choose to use a load balancer or just statically configure two or three of their kafka hosts in the clients."
In other words, no-kafka probably performs the correct operations, but expects us to connect using a string kafka-0:9092,kafka-1:9092,...
. Hence, the standard kubernetes service fails
We can get the actual name that each broker thinks it has, from the logs. See https://github.com/Yolean/kubernetes-kafka/blob/v1.0.0/README.md#start-kafka.
In minikube I get entries like kafka-0.broker.kafka.svc.cluster.local,9092,ListenerName(PLAINTEXT)
so I think our bootstrap should be kafka-0.broker.kafka.svc.cluster.local:9092,kafka-1.broker.kafka.svc.cluster.local:9092,kafka-2.broker.kafka.svc.cluster.local:9092
. It will likely be sufficient if we scale up too.
The shell scripts in the kafka image (0.10.2.0) give a quite confusing picture:
./bin/kafka-topics.sh
says--zookeeper
isREQUIRED
./bin/kafka-console-consumer.sh
says--bootstrap-server
AND--zookeeper
isREQUIRED
./bin/kafka-console-producer.sh
has neither of the above, but says--broker-list
isREQUIRED
And when trying to produce messages I get errors like org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.