Yolean/kubernetes-kafka

Connection refused - Failed to get broker metrics with Kafka Manager

solsson opened this issue · 10 comments

Topic management works, but the metrics part of Kafka Manager is non-functional. Logs display repeated errors like:

[error] k.m.a.c.BrokerViewCacheActor - Failed to get broker metrics for BrokerIdentity(1,10.132.0.2,5555,false,true,Map(PLAINTEXT -> 32401))
java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10.132.0.2; nested exception is: 
	java.net.ConnectException: Connection refused (Connection refused)]
	at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369) ~[na:1.8.0_144]
	at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) ~[na:1.8.0_144]
	at kafka.manager.jmx.KafkaJMX$.doWithConnection(KafkaJMX.scala:57) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
	at kafka.manager.actor.cluster.BrokerViewCacheActor$$anonfun$kafka$manager$actor$cluster$BrokerViewCacheActor$$updateBrokerMetrics$1$$anonfun$apply$27$$anonfun$apply$3.apply$mcV$sp(BrokerViewCacheActor.scala:358) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
	at kafka.manager.actor.cluster.BrokerViewCacheActor$$anonfun$kafka$manager$actor$cluster$BrokerViewCacheActor$$updateBrokerMetrics$1$$anonfun$apply$27$$anonfun$apply$3.apply(BrokerViewCacheActor.scala:355) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
	at kafka.manager.actor.cluster.BrokerViewCacheActor$$anonfun$kafka$manager$actor$cluster$BrokerViewCacheActor$$updateBrokerMetrics$1$$anonfun$apply$27$$anonfun$apply$3.apply(BrokerViewCacheActor.scala:355) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) ~[org.scala-lang.scala-library-2.11.12.jar:na]
	at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) ~[org.scala-lang.scala-library-2.11.12.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_144]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_144]
Caused by: javax.naming.ServiceUnavailableException: null
	at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:136) ~[na:1.8.0_144]
	at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205) ~[na:1.8.0_144]
	at javax.naming.InitialContext.lookup(InitialContext.java:417) ~[na:1.8.0_144]
	at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955) ~[na:1.8.0_144]
	at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922) ~[na:1.8.0_144]
	at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:287) ~[na:1.8.0_144]
	at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) ~[na:1.8.0_144]
	at kafka.manager.jmx.KafkaJMX$.doWithConnection(KafkaJMX.scala:57) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
	at kafka.manager.actor.cluster.BrokerViewCacheActor$$anonfun$kafka$manager$actor$cluster$BrokerViewCacheActor$$updateBrokerMetrics$1$$anonfun$apply$27$$anonfun$apply$3.apply$mcV$sp(BrokerViewCacheActor.scala:358) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
	at kafka.manager.actor.cluster.BrokerViewCacheActor$$anonfun$kafka$manager$actor$cluster$BrokerViewCacheActor$$updateBrokerMetrics$1$$anonfun$apply$27$$anonfun$apply$3.apply(BrokerViewCacheActor.scala:355) ~[kafka-manager.kafka-manager-1.3.3.18-sans-externalized.jar:na]
Caused by: java.rmi.ConnectException: Connection refused to host: 10.132.0.2; nested exception is: 
	java.net.ConnectException: Connection refused (Connection refused)
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619) ~[na:1.8.0_144]
	at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216) ~[na:1.8.0_144]
	at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202) ~[na:1.8.0_144]
	at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:338) ~[na:1.8.0_144]
	at sun.rmi.registry.RegistryImpl_Stub.lookup(RegistryImpl_Stub.java:112) ~[na:1.8.0_144]
	at com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:132) ~[na:1.8.0_144]
	at com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:205) ~[na:1.8.0_144]
	at javax.naming.InitialContext.lookup(InitialContext.java:417) ~[na:1.8.0_144]
	at javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1955) ~[na:1.8.0_144]
	at javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1922) ~[na:1.8.0_144]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0_144]
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[na:1.8.0_144]
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[na:1.8.0_144]
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[na:1.8.0_144]
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_144]
	at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_144]
	at java.net.Socket.connect(Socket.java:538) ~[na:1.8.0_144]
	at java.net.Socket.<init>(Socket.java:434) ~[na:1.8.0_144]
	at java.net.Socket.<init>(Socket.java:211) ~[na:1.8.0_144]
	at sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:40) ~[na:1.8.0_144]

Based on the first line of the error message it looks like Kafka Manager tries the IP from the "outside" listener. That's a host ip, which won't work with the JMX port (unless we also make it a hostPort).

Based on 4c202f4 I think we list listeners in the proper order.

The two commits looks very relevant

Indeed. Maybe something will behave differently with next release. However, our outside listener is also PLAINTEXT (by default). A single PLAINTEXT listener probably works.

@solsson the configuration to take into account is advertised.listeners or listeners. The order does matter as the official documentation mentions. The configuration you mention here is not the one relevant for zookeeper consumers (I might've interpreted wrong though).
Placing the internal first makes everything work as expected.

That's interesting. From the commit comment you reference it looks like I made the switch base don Kafka Manager. Switching back is a risky change to do now because I don't know what it affects.

Do you mean that the referenced configuration is relevant to Kafka clients, but not to stuff that contacts Zookeeper directly?

Hi guys, sorry may be a dumb one but why is advertised.listeners set to point to the k8s host ip ? (#init#advertised.listeners=OUTSIDE://#init#,PLAINTEXT://:9092 results in advertised.listeners=OUTSIDE://172.31.221.5:32400,PLAINTEXT://:9092 ) . Nothing listens on 3240 on the hosts.

why isn't it a pod's hostname which would resolve into cluster ip and there would be connection on jmx port for that ip?

@TattiQ That's for #78 but see also #187

Hi guys, sorry may be a dumb one but why is advertised.listeners set to point to the k8s host ip ? (#init#advertised.listeners=OUTSIDE://#init#,PLAINTEXT://:9092 results in advertised.listeners=OUTSIDE://172.31.221.5:32400,PLAINTEXT://:9092 ) . Nothing listens on 3240 on the hosts.

Thanks for the pointer, implemented and verified with #251.

@solsson: I've been experimenting with microk8s and the confluent https://github.com/confluentinc/cp-helm-charts repo. Looking at Confluents Broker Configuration:

cat  /etc/kafka/kafka.properties

broker.id=0
zookeeper.connect=my-confluent-cp-zookeeper-headless:2181
advertised.listeners=PLAINTEXT://my-confluent-cp-kafka-0.my-confluent-cp-kafka-headless.default:9092,EXTERNAL://10.10.10.10:31090
offsets.topic.replication.factor=3
heap.opts=-Xms512M -Xmx512M
listener.security.protocol.map=PLAINTEXT:PLAINTEXT,EXTERNAL:PLAINTEXT
log.dirs=/opt/kafka/data-0/logs
listeners=PLAINTEXT://0.0.0.0:9092,EXTERNAL://0.0.0.0:31090
jmx.port=5555

I'd say we can confirm to have made the proper change (switching OUTSIDE and PLAINTEXT).

Great. Working on Kafka Manager upgrade in #257.