Yolean/kubernetes-kafka

Burrow Exporter in CrashLoopBackOff when Kafka cluster has issues

Opened this issue · 3 comments

I have referenced your Helm charts for Burrow and Burrow exporter and built Docker images for burrow exporter using https://github.com/jirwin/burrow_exporter/.
However, when things are not in good shape with the Kafka cluster things go wrong with Burrow and Burrow exporter as well. The pod goes into CrashLoopBackOff state.
Following is the log

time="2019-07-24T02:16:09Z" level=error msg="error listing clusters. Continuing." err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused" time="2019-07-24T02:16:39Z" level=info msg="Scraping burrow..." timestamp=1563934599815846635 time="2019-07-24T02:16:39Z" level=error msg="error making request" endpoint="http://localhost:8000/v3/kafka" err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused" time="2019-07-24T02:16:39Z" level=error msg="error retrieving cluster details" err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused" time="2019-07-24T02:16:39Z" level=error msg="error listing clusters. Continuing." err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused"

We do not have liveness and readiness probes defined for burrow exporter, how do we check it?
My concern is any issue with the Kafka cluster should not put the burrow-exporter in CrashLoopBackOff state

So you're not using https://github.com/Yolean/kubernetes-kafka/tree/master/linkedin-burrow? Do you think that the crash loop can be avoided through configuration changes? Isn't it by burrow's design?

@solsson I am using pretty much the same thing except the docker images which I have built using the Dockerfile at https://github.com/solsson/burrow_exporter/blob/master/Dockerfile

I had the same error. Change the burrow config File to servers= [bootstrap:9092] instead of servers=["kafka-0.broker:9092", "kafka-1.broker:9092", "kafka-2.broker:9092"]. SVC is not broker so it will not work. In my case, after this change, it worked correctly.

[cluster.local]
class-name="kafka"
servers=[ "bootstrap:9092" ]
topic-refresh=60
offset-refresh=30

[consumer.local]
class-name="kafka"
cluster="local"
servers=[ "bootstrap:9092" ]
group-blacklist=""
group-whitelist=""