Yolean/kubernetes-kafka

Broker unresponsive due to NotEnoughReplicasException

Closed this issue · 2 comments

One of our brokers was unresponsive, leading to timeouts in clients. Was busy in a loop that logged:

[2017-12-17 21:33:52,534] INFO [GroupCoordinator 1]: Preparing to rebalance group user-sessions-stream with old generation 779 (__consumer_offsets-32) (kafka.coordinator.group.GroupCoordinator)
[2017-12-17 21:33:52,637] INFO [GroupCoordinator 1]: Stabilized group user-sessions-stream generation 780 (__consumer_offsets-32) (kafka.coordinator.group.GroupCoordinator)
[2017-12-17 21:33:52,637] INFO [GroupCoordinator 1]: Assignment received from leader for group user-sessions-stream for generation 780 (kafka.coordinator.group.GroupCoordinator)
[2017-12-17 21:33:52,638] ERROR [ReplicaManager broker=1] Error processing append operation on partition __consumer_offsets-32 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.NotEnoughReplicasException: Number of insync replicas for partition __consumer_offsets-32 is [1], below required minimum [2]

Quite possibly due to the configuration change in #107.

Solved using #95 with the diff:

diff --git a/maintenance/reassign-paritions-job.yml b/maintenance/reassign-paritions-job.yml
index e9e184e..12e9219 100644
--- a/maintenance/reassign-paritions-job.yml
+++ b/maintenance/reassign-paritions-job.yml
@@ -16,9 +16,9 @@ spec:
           value: zookeeper.kafka:2181
         # the following must be edited per job
         - name: TOPICS
-          value: test-produce-consume,test-kafkacat
+          value: __consumer_offsets
         - name: BROKERS
-          value: 0,2
+          value: 0,1,2
         command:
         - /bin/bash
         - -ce

Maybe the above helped temporarily, or maybe it only stopped the flow of logs for a while.
Actually increasing replicas had better results:

diff --git a/maintenance/reassign-paritions-job.yml b/maintenance/reassign-paritions-job.yml
index e9e184e..0cb4c6a 100644
--- a/maintenance/reassign-paritions-job.yml
+++ b/maintenance/reassign-paritions-job.yml
@@ -16,9 +16,9 @@ spec:
           value: zookeeper.kafka:2181
         # the following must be edited per job
         - name: TOPICS
-          value: test-produce-consume,test-kafkacat
+          value: __consumer_offsets
         - name: BROKERS
-          value: 0,2
+          value: 0,1,2
         command:
         - /bin/bash
         - -ce
@@ -43,6 +43,10 @@ spec:
           echo "# proposed-reassignment.json";
           cat /tmp/proposed-reassignment.json;
 
+          sed -i 's/"replicas":\[.\]/"replicas":[0,1,2]/g' /tmp/proposed-reassignment.json;
+          sed -i 's/,"log_dirs":\["any"\]//g' /tmp/proposed-reassignment.json;
+          cat /tmp/proposed-reassignment.json;
+
           ./bin/kafka-reassign-partitions.sh
           --zookeeper=$ZOOKEEPER
           --execute