Mirus failing to start when brokers are deployed to new ip's
Hari4AMQ opened this issue · 7 comments
Hi Team,
I had a mirus pipeline working on a source Kafka cluster to a destination server and I used DNS connection url in the connection string.
I spun up destination Kafka cluster in new servers for upgrading Kafka from 1.1 to 2.1.
Since then, Mirus is failing to start with the below exception:
[2019-03-27 09:21:21,458] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
org.apache.kafka.connect.errors.ConnectException: Error while attempting to create/find topic(s) 'mirus-offsets'
at org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:255)
at org.apache.kafka.connect.storage.KafkaOffsetBackingStore$1.run(KafkaOffsetBackingStore.java:99)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:127)
at org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:109)
at org.apache.kafka.connect.runtime.Worker.start(Worker.java:174)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:114)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:215)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
at org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:228)
... 11 more
Caused by: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
[2019-03-27 09:21:21,480] INFO DefaultSessionIdManager workerName=node0 (org.eclipse.jetty.server.session)
Mirus stops after writing the above error in the logs.
I also verified that the mirus-* topics have required ACL's added on them.
Please advice on how to fix it.
I tried by recreating the mirus-status, mirus-config and mirus-offset topics and granted required ACL's but still Mirus is down.
I changed logging to Debug but still not extra information to debug the issue. Is there anything that I'm missing.
The exception says TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
, so Mirus doesn't have permission to access your Kafka cluster - at least on the admin topics. I would looks at your Kafka ACLs.
@pdavidson100 I've added the required ACL's and also open permissions (*) on the admin topics.
Also, If we just have authorization issues, Mirus will still continue running but it keeps throwing below exception:
[2019-04-01 08:52:47,260] WARN [Consumer clientId=consumer-1, groupId=newmirus-ost-server] Not authorized to read from topic mirus-offsets. (org.apache.kafka.clients.consumer.internals.Fetcher) [2019-04-01 08:52:47,260] ERROR Error polling: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [mirus-offsets] (org.apache.kafka.connect.util.KafkaBasedLog)
But in my case, I have the required ACL's on the admin topics.
Mirus was in running state till the destination Kafka clusters moved/migrated to new server IP's (please note that I'm using vip url in the Mirus connection string and the vip url resolves to new servers as well).
I tried by changing admin topic name that Mirus uses.. but issue persists.
Is there any way that I can clear the cluster meta data from Kafka Monitor thread ( I tired clearing messages from all 3 admin topics but it didn't help)? Because the logs says that "Mirus worker process" started but error'ing out when trying to start "task monitor thread".
[2019-04-01 17:57:50,547] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect)
[2019-04-01 17:57:50,548] INFO Starting a task monitor thread... (com.salesforce.mirus.HerderStatusMonitor)
[2019-04-01 17:57:53,396] ERROR Uncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
org.apache.kafka.connect.errors.ConnectException: Error while attempting to create/find topic(s) 'mirus-offsets'
at org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:255)
at org.apache.kafka.connect.storage.KafkaOffsetBackingStore$1.run(KafkaOffsetBackingStore.java:99)
at org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:127)
at org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:109)
at org.apache.kafka.connect.runtime.Worker.start(Worker.java:174)
at org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:114)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:215)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
at org.apache.kafka.connect.util.TopicAdmin.createTopics(TopicAdmin.java:228)
... 11 more
Caused by: org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [Topic authorization failed.]
[2019-04-01 17:57:53,398] INFO TaskMonitor thread stopping (com.salesforce.mirus.HerderStatusMonitor)
[2019-04-01 17:57:53,398] INFO TaskMonitor thread stopped (com.salesforce.mirus.HerderStatusMonitor)
[2019-04-01 17:57:53,398] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect)
[2019-04-01 17:57:53,399] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer)
[2019-04-01 17:57:53,402] INFO Stopped http_8090@1a1d3c1a{HTTP/1.1,[http/1.1]}{0.0.0.0:8090} (org.eclipse.jetty.server.AbstractConnector)
[2019-04-01 17:57:53,402] INFO node0 Stopped scavenging (org.eclipse.jetty.server.session)
[2019-04-01 17:57:53,411] INFO Stopped o.e.j.s.ServletContextHandler@35e52059{/,null,UNAVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler)
[2019-04-01 17:57:53,412] INFO REST server stopped (org.apache.kafka.connect.runtime.rest.RestServer)
[2019-04-01 17:57:53,412] INFO Herder stopping (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2019-04-01 17:57:58,417] INFO Herder stopped (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2019-04-01 17:57:58,417] INFO Kafka Connect stopped (org.apache.kafka.connect.runtime.Connect)```
The exception is happening when the Kafka Connect Worker tries to initialize the offset backing store, so this is not an issue in the Mirus codebase. The exception is quite explicit about not being authorized to create or find the Kafka Connect offsets topic. You need to create the offsets topic, or give Mirus permissions to create it, as you would for any Kafka Connect cluster.
Note that clearing the admin topics clears all state, so no need to worry about Kafka Monitor thread state.
@pdavidson100 I have the offsets topic created and has open permissions (*) on it as well, also has auto.create.topics.enable=true
as well.
But surprisingly, I'm still getting the above exception. I think, its something to do with kafka-connect wherein kafka-connect failing to pickup new IP's and instead trying to establish connection to old servers.
Issue is resolved by adding "create" permission on all 3 admin topics. Thanks for your support!