WARN Failed to resolve address: zoo-1.zoo
paltaa opened this issue · 15 comments
Hey, trying to get the zookeeper running but it seems pods cannot see each other, guess i probably have a problem with label and selectors, here are my files:
configmap
kind: ConfigMap
metadata:
name: zookeeper-config
namespace: whitenfv
labels:
name: zookeeper
system: whitenfv
app: zookeeper
apiVersion: v1
data:
init.sh: |-
#!/bin/bash
set -x
[ -z "$ID_OFFSET" ] && ID_OFFSET=1
export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET))
echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid
cp -Lur /etc/kafka-configmap/* /etc/kafka/
sed -i "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties
zookeeper.properties: |-
tickTime=2000
dataDir=/var/lib/zookeeper/data
dataLogDir=/var/lib/zookeeper/log
clientPort=2181
initLimit=5
syncLimit=2
server.1=pzoo-0.pzoo:2888:3888:participant
server.2=pzoo-1.pzoo:2888:3888:participant
server.3=pzoo-2.pzoo:2888:3888:participant
server.4=zoo-0.zoo:2888:3888:participant
server.5=zoo-1.zoo:2888:3888:participant
log4j.properties: |-
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
# Suppress connection log messages, three lines per livenessProbe execution
log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN
log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN
zoo service
apiVersion: v1
kind: Service
metadata:
name: zoo
namespace: whitenfv
labels:
name: zookeeper
system: whitenfv
app: zookeeper
spec:
ports:
- port: 2888
name: peer
- port: 3888
name: leader-election
clusterIP: None
selector:
name: zookeeper
system: whitenfv
pzoo service
apiVersion: v1
kind: Service
metadata:
name: pzoo
namespace: whitenfv
labels:
name: zookeeper
system: whitenfv
app: zookeeper
spec:
ports:
- port: 2888
name: peer
- port: 3888
name: leader-election
clusterIP: None
selector:
name: zookeeper
system: whitenfv
zookeeper service
apiVersion: v1
kind: Service
metadata:
name: zookeeper
namespace: whitenfv
spec:
ports:
- port: 2181
name: client
selector:
app: zookeeper
namespace: whitenfv
and finally statefulset
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zoo
namespace: whitenfv
labels:
name: zookeeper
system: whitenfv
app: zookeeper
spec:
selector:
matchLabels:
app: zookeeper
serviceName: "zoo"
replicas: 2
updateStrategy:
type: OnDelete
template:
metadata:
labels:
app: zookeeper
namespace: whitenfv
annotations:
spec:
terminationGracePeriodSeconds: 10
initContainers:
- name: init-config
image: solsson/kafka-initutils@sha256:18bf01c2c756b550103a99b3c14f741acccea106072cd37155c6d24be4edd6e2
command: ['/bin/bash', '/etc/kafka-configmap/init.sh']
env:
- name: ID_OFFSET
value: "4"
volumeMounts:
- name: configmap
mountPath: /etc/kafka-configmap
- name: config
mountPath: /etc/kafka
- name: data
mountPath: /var/lib/zookeeper/data
containers:
- name: zookeeper
image: solsson/kafka:1.0.2@sha256:7fdb326994bcde133c777d888d06863b7c1a0e80f043582816715d76643ab789
env:
- name: KAFKA_LOG4J_OPTS
value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
command:
- ./bin/zookeeper-server-start.sh
- /etc/kafka/zookeeper.properties
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: peer
- containerPort: 3888
name: leader-election
readinessProbe:
exec:
command:
- /bin/sh
- -c
- '[ "imok" = "$(echo ruok | nc -w 1 -q 1 127.0.0.1 2181)" ]'
volumeMounts:
- name: config
mountPath: /etc/kafka
- name: data
mountPath: /var/lib/zookeeper/data
volumes:
- name: configmap
configMap:
name: zookeeper-config
- name: config
emptyDir: {}
- name: data
emptyDir: {}
and the errors are;
[2018-10-12 20:40:04,526] WARN Failed to resolve address: pzoo-0.pzoo (org.apache.zookeeper.server.quorum.QuorumPeer)
java.net.UnknownHostException: pzoo-0.pzoo: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at java.net.InetAddress.getByName(InetAddress.java:1076)
at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:166)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:595)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
[2018-10-12 20:40:04,527] WARN Cannot open channel to 2 at election address pzoo-1.pzoo:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.UnknownHostException: pzoo-1.pzoo
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
[2018-10-12 20:40:04,547] WARN Failed to resolve address: pzoo-1.pzoo (org.apache.zookeeper.server.quorum.QuorumPeer)
its been a couple hours now looking around in the issues but couldnt find the solution, if anyone could point me in the right direction...
Is there any pzoo-0 pod? Does the pzoo service exist? Can your cluster resolve other services?
@solsson yes the cluster resolve other services, there are not pzoo pods, thats why i was wondering whats that service for, anyways asked the question a few mins vefore leaving the office, work is in a local vm cluster so wont be back until tuesday, thanks for the fast response
@solsson okay so the setup is 3 replicas in the statefulset, and changed the configmap to :
server.1=zoo-0.zoo:2888:3888:participant
server.2=zoo-1.zoo:2888:3888:participant
server.3=zoo-2.zoo:2888:3888:participant
but still there the failed to resolve adress, [2018-10-16 14:17:21,595] WARN Failed to resolve address: zoo-1.zoo (org.apache.zookeeper.server.quorum.QuorumPeer)
if the namespace is whitenfv, headless is zoo, how should i config this, something like zoo-0.zoo.whitenfv.local.cluster?
Short names should be fine. You need to dig into the headless services, Endpoints etc. Possibly also the general DNS lookup behavior within your cluster. Did you try first with the default namespace. In other words did this happen because of the namespace change?
Short names should be fine. You need to dig into the headless services, Endpoints etc. Possibly also the general DNS lookup behavior within your cluster. Did you try first with the default namespace. In other words did this happen because of the namespace change?
well was trying only in the namespace whitenfv, didnt try on the default namespace... assumed it worked all right and jumped into this test,
if i try nslookup inside zoo-0 pod
nslookup kafka-0.kafka.whitenfv.svc.clust>
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kafka-0.kafka.whitenfv.svc.cluster.local
Address: 10.244.0.191
so it works, weird thing if i do the same with the zookeeper pod it wont work
also if i nslookup kafka-0.kafka, or nslookup zookeeper, but if i nslookup zoo-0.zookeeper so the problem is with my labels, selectors still i think...
@solsson okay, finally fixed the problem, it was indeed dnd errors in the labels and selectors of the services for the dns, so the first kafka pod schedules and runs okay, now getting this error on kafka-1
[2018-10-16 17:13:30,729] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentBrokerIdException: Configured broker.id 1 doesn't match stored broker.id 0 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
at kafka.server.KafkaServer.getBrokerIdAndOfflineDirs(KafkaServer.scala:628)
at kafka.server.KafkaServer.startup(KafkaServer.scala:201)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:92)
at kafka.Kafka.main(Kafka.scala)
[2018-10-16 17:13:30,755] INFO shutting down (kafka.server.KafkaServer)
[2018-10-16 17:13:30,765] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2018-10-16 17:13:30,777] INFO EventThread shut down for session: 0x1667dceb4710001 (org.apache.zookeeper.ClientCnxn)
[2018-10-16 17:13:30,777] INFO Session: 0x1667dceb4710001 closed (org.apache.zookeeper.ZooKeeper)
[2018-10-16 17:13:30,782] INFO shut down completed (kafka.server.KafkaServer)
[2018-10-16 17:13:30,788] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
[2018-10-16 17:13:30,789] INFO shutting down (kafka.server.KafkaServer)
also checked the logs on zookeeper-0, 1 and 2, everything its working okay and they managed to elect a master, now the kafka-1 tries to conect to the master zookeeper and this happens.
how did u fix it? what was the issues with labels and selectors?
This occurs when the name of the headless service and the serviceName in the statefulset don't match exactly. In my example, one was "zoo" and the other "pzoo"
Unfortunately, having the same issue.
Same issue, Each pod refuses to resolve the hostname. Resolution for this issue is not clear. Please provide clear steps on how you resolved this issue.
Also share version of nodes, pods, coredns, what network add on your using, etc.
The same problem ... Help others please @paltaa
Hey! This was 4 years ago, don't have the code for it and it was the selectors as I said before, there are working helm charts that need little to no configuration like this one: https://github.com/bitnami/charts/tree/master/bitnami/kafka
@paltaa Other ways like Helm & Operators ( Strimzi, Confluent, ... ) are easy as always. Buy build cluster using manifests is better in some situations.
This is a good repository with low-quality documentation.
Well, be sure to check on labels, selectors and config map to match those DNS and it should work