seglo/kafka-lag-exporter

Implement backoff strategy for Kafka connections in Kafka Lag Exporter

seglo opened this issue · 0 comments

seglo commented

When installed in a fresh cluster Kafka Lag Exporter will fail if configured/discovered Kafka clusters cannot be reached. Kafka Lag Exporter is configured to automatically discover Strimzi Kafka clusters by watching for Kafka CRD’s, but at first install it may detect the Kafka CRD before Kafka has finished coming online, and fail. It won’t attempt to connect again.

The workaround right now is to delete the pod and let its deployment recreate it once the Kafka clusters are online. Since Kafka Lag Exporter can support multiple clusters I would like to add a backoff strategy to connection attempts so it will try to connect to clusters indefinitely.

Logs for kafka-lag-exporter pod:

2019-01-17 13:26:33,410 WARN  org.apache.kafka.clients.ClientUtils  - Couldn't resolve server pipelines-strimzi-kafka-bootstrap.lightbend:9092 from bootstrap.servers as DNS resolution failed for pipelines-strimzi-kafka-bootstrap.lightbend
2019-01-17 13:26:33,423 ERROR akka.actor.OneForOneStrategy akka://kafkalagexporterapp/user/consumer-group-collector-pipelines-strimzi - Failed create new KafkaAdminClient
akka.actor.ActorInitializationException: akka://kafkalagexporterapp/user/consumer-group-collector-pipelines-strimzi: exception during creation
        at akka.actor.ActorInitializationException$.apply(Actor.scala:193)
        at akka.actor.ActorCell.create(ActorCell.scala:669)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:523)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:545)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:283)
        at akka.dispatch.Mailbox.run(Mailbox.scala:224)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.kafka.common.KafkaException: Failed create new KafkaAdminClient
        at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:378)
        at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:54)
        at com.lightbend.kafkalagexporter.KafkaClient$.com$lightbend$kafkalagexporter$KafkaClient$$createAdminClient(KafkaClient.scala:42)
        at com.lightbend.kafkalagexporter.KafkaClient.<init>(KafkaClient.scala:71)
        at com.lightbend.kafkalagexporter.KafkaClient$.apply(KafkaClient.scala:18)
        at com.lightbend.kafkalagexporter.MainApp$.$anonfun$clientCreator$1(MainApp.scala:26)
        at com.lightbend.kafkalagexporter.ConsumerGroupCollector$.$anonfun$init$1(ConsumerGroupCollector.scala:47)
        at akka.actor.typed.Behavior$DeferredBehavior$$anon$1.apply(Behavior.scala:219)
        at akka.actor.typed.Behavior$.start(Behavior.scala:300)
        at akka.actor.typed.internal.adapter.ActorAdapter.start(ActorAdapter.scala:145)
        at akka.actor.typed.internal.adapter.ActorAdapter.preStart(ActorAdapter.scala:140)
        at akka.actor.Actor.aroundPreStart(Actor.scala:528)
        at akka.actor.Actor.aroundPreStart$(Actor.scala:528)
        at akka.actor.typed.internal.adapter.ActorAdapter.aroundPreStart(ActorAdapter.scala:21)
        at akka.actor.ActorCell.create(ActorCell.scala:652)
        ... 9 common frames omitted
Caused by: org.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
        at org.apache.kafka.clients.ClientUtils.parseAndValidateAddresses(ClientUtils.java:86)
        at org.apache.kafka.clients.admin.KafkaAdminClient.<init>(KafkaAdminClient.java:417)
        at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:371)
        ... 23 common frames omitted