Yelp/kafka-utils

Unable to list consumer groups from kafka 0.9.1

yagnasrinath opened this issue · 5 comments

list_topics works..but list_groups is hung forever.
ubuntu@loadts5 ~>kafka-consumer-manager --cluster-type kafkaclusters --cluster-name cluster-1 list_topics --storage kafka "10-0-4-34"
Cluster name: cluster-1, consumer group: 10-0-4-34
Consumer Group ID: 10-0-4-34
Topic: uid_cap
Partitions: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Topic: uid_segment
Partitions: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Topic: ip_segment
Partitions: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Topic: geo_segment
Partitions: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Topic: device_segment
Partitions: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
ubuntu@loadts5 ~>kafka-consumer-manager --cluster-type kafkaclusters --cluster-name cluster-1 list_groups --storage kafka

Usually the reason why it hangs is that the __consumer_offsets topic is really big and it takes a lot of time to scan the entire topic.
The only way to get the list of all consumers in Kafka 0.9 is to scan the __consumer_offsets topic, decode all the messages and create the list of groups from the committed offsets. The __consumer_offsets topic is usually compacted. Unless you have thousands of groups and thousands of partitions the size should be pretty small. It usually shouldn't take more than 30-60 seconds to complete.

However, there are some bugs in the version of Kafka that crash the log compaction thread. If your __consumer_offsets topic doesn't get compacted it will grow to the size of GB. This will both slow down kafka-consumer-manager and also eventually cause consumer groups to get stuck in rebalance phase.

Thank you for the response. we do not have so many groups. I tried the same thing with scripts that come with kafka. Its immediate. Does this mean that kafka-utils list consumer-groups which do not have active consumers as well?
bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server x.x.x.x:9092 --list
newflumekafka_postback
new-flumekafka-uid-global-cache
newflume31
flumeKafka_bid1
flume
flume-new
newflumekafka_sdk-server
newflumeretargeting-pipeline
new1flumededupe-server
newflumekafka_attribution
flumekafka-adserver
10-0-4-34
newflumekafka-adserver
newflumeimptracer
newflumekafka-lossnotification
newflumecookie-match
newflume1

kafka-consumer-manager also reports groups that are not currently active, as long as their offsets haven't expired yet (iirc the broker side config for that is offsets.retention.minutes).

To be honest, I just found a bug that may cause kafka-consumer-manager list_groups to hang. It's not caught in the acceptance tests because it seems to happen with consumers that are extremely active in commits.

Thanks for reporting. I'll submit a pr to fix this.

Thank you. I will be happy to test it! is there any kind of debug mode, which will log what's happening in the background?

You can clone the repository and add something like logging.basicConfig(level=logging.INFO) in the main command module, ex: kafka_utils/kafka_cluster_manager/main.py.

This should be solved in #107, however it will still take few minutes for the list_groups command to finish. Scanning the entire __consumer_offsets takes quite some time.