Yolean/kubernetes-kafka

WARN Failed to resolve address: zoo-1.zoo

paltaa opened this issue · 15 comments

Hey, trying to get the zookeeper running but it seems pods cannot see each other, guess i probably have a problem with label and selectors, here are my files:

configmap

kind: ConfigMap
metadata:
  name: zookeeper-config
  namespace: whitenfv
  labels:
    name: zookeeper
    system: whitenfv
    app: zookeeper
apiVersion: v1
data:
  init.sh: |-
    #!/bin/bash
    set -x

    [ -z "$ID_OFFSET" ] && ID_OFFSET=1
    export ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-} + $ID_OFFSET))
    echo "${ZOOKEEPER_SERVER_ID:-1}" | tee /var/lib/zookeeper/data/myid
    cp -Lur /etc/kafka-configmap/* /etc/kafka/
    sed -i "s/server\.$ZOOKEEPER_SERVER_ID\=[a-z0-9.-]*/server.$ZOOKEEPER_SERVER_ID=0.0.0.0/" /etc/kafka/zookeeper.properties

  zookeeper.properties: |-
    tickTime=2000
    dataDir=/var/lib/zookeeper/data
    dataLogDir=/var/lib/zookeeper/log
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=pzoo-0.pzoo:2888:3888:participant
    server.2=pzoo-1.pzoo:2888:3888:participant
    server.3=pzoo-2.pzoo:2888:3888:participant
    server.4=zoo-0.zoo:2888:3888:participant
    server.5=zoo-1.zoo:2888:3888:participant

  log4j.properties: |-
    log4j.rootLogger=INFO, stdout
    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n

    # Suppress connection log messages, three lines per livenessProbe execution
    log4j.logger.org.apache.zookeeper.server.NIOServerCnxnFactory=WARN
    log4j.logger.org.apache.zookeeper.server.NIOServerCnxn=WARN


zoo service

apiVersion: v1
kind: Service
metadata:
  name: zoo
  namespace: whitenfv
  labels: 
    name: zookeeper
    system: whitenfv
    app: zookeeper
spec:
  ports:
  - port: 2888
    name: peer
  - port: 3888
    name: leader-election
  clusterIP: None
  selector:
    name: zookeeper
    system: whitenfv

pzoo service

apiVersion: v1
kind: Service
metadata:
  name: pzoo
  namespace: whitenfv
  labels:
    name: zookeeper
    system: whitenfv
    app: zookeeper

spec:
  ports:
  - port: 2888
    name: peer
  - port: 3888
    name: leader-election
  clusterIP: None
  selector:
     name: zookeeper
     system: whitenfv

zookeeper service

apiVersion: v1
kind: Service
metadata:
  name: zookeeper
  namespace: whitenfv
spec:
  ports:
  - port: 2181
    name: client
  selector:
    app: zookeeper
    namespace: whitenfv

and finally statefulset

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zoo
  namespace: whitenfv
  labels:
    name: zookeeper
    system: whitenfv
    app: zookeeper
spec:
  selector:
    matchLabels:
      app: zookeeper
  serviceName: "zoo"
  replicas: 2
  updateStrategy:
    type: OnDelete
  template:
    metadata:
      labels:
        app: zookeeper
        namespace: whitenfv

      annotations:
    spec:
      terminationGracePeriodSeconds: 10
      initContainers:
      - name: init-config
        image: solsson/kafka-initutils@sha256:18bf01c2c756b550103a99b3c14f741acccea106072cd37155c6d24be4edd6e2
        command: ['/bin/bash', '/etc/kafka-configmap/init.sh']
        env:
        - name: ID_OFFSET
          value: "4"
        volumeMounts:
        - name: configmap
          mountPath: /etc/kafka-configmap
        - name: config
          mountPath: /etc/kafka
        - name: data
          mountPath: /var/lib/zookeeper/data
      containers:
      - name: zookeeper
        image: solsson/kafka:1.0.2@sha256:7fdb326994bcde133c777d888d06863b7c1a0e80f043582816715d76643ab789
        env:
        - name: KAFKA_LOG4J_OPTS
          value: -Dlog4j.configuration=file:/etc/kafka/log4j.properties
        command:
        - ./bin/zookeeper-server-start.sh
        - /etc/kafka/zookeeper.properties
        ports:
        - containerPort: 2181
          name: client
        - containerPort: 2888
          name: peer
        - containerPort: 3888
          name: leader-election
        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - '[ "imok" = "$(echo ruok | nc -w 1 -q 1 127.0.0.1 2181)" ]'
        volumeMounts:
        - name: config
          mountPath: /etc/kafka
        - name: data
          mountPath: /var/lib/zookeeper/data
      volumes:
      - name: configmap
        configMap:
          name: zookeeper-config
      - name: config
        emptyDir: {}
      - name: data
        emptyDir: {}

and the errors are;

[2018-10-12 20:40:04,526] WARN Failed to resolve address: pzoo-0.pzoo (org.apache.zookeeper.server.quorum.QuorumPeer)
java.net.UnknownHostException: pzoo-0.pzoo: Name or service not known
	at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
	at java.net.InetAddress.getAllByName(InetAddress.java:1192)
	at java.net.InetAddress.getAllByName(InetAddress.java:1126)
	at java.net.InetAddress.getByName(InetAddress.java:1076)
	at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:166)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:595)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
[2018-10-12 20:40:04,527] WARN Cannot open channel to 2 at election address pzoo-1.pzoo:3888 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
java.net.UnknownHostException: pzoo-1.pzoo
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:562)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:614)
	at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:843)
	at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:913)
[2018-10-12 20:40:04,547] WARN Failed to resolve address: pzoo-1.pzoo (org.apache.zookeeper.server.quorum.QuorumPeer)


its been a couple hours now looking around in the issues but couldnt find the solution, if anyone could point me in the right direction...

Is there any pzoo-0 pod? Does the pzoo service exist? Can your cluster resolve other services?

@solsson yes the cluster resolve other services, there are not pzoo pods, thats why i was wondering whats that service for, anyways asked the question a few mins vefore leaving the office, work is in a local vm cluster so wont be back until tuesday, thanks for the fast response

@solsson okay so the setup is 3 replicas in the statefulset, and changed the configmap to :


    server.1=zoo-0.zoo:2888:3888:participant
    server.2=zoo-1.zoo:2888:3888:participant
    server.3=zoo-2.zoo:2888:3888:participant

but still there the failed to resolve adress, [2018-10-16 14:17:21,595] WARN Failed to resolve address: zoo-1.zoo (org.apache.zookeeper.server.quorum.QuorumPeer)

if the namespace is whitenfv, headless is zoo, how should i config this, something like zoo-0.zoo.whitenfv.local.cluster?

Short names should be fine. You need to dig into the headless services, Endpoints etc. Possibly also the general DNS lookup behavior within your cluster. Did you try first with the default namespace. In other words did this happen because of the namespace change?

Short names should be fine. You need to dig into the headless services, Endpoints etc. Possibly also the general DNS lookup behavior within your cluster. Did you try first with the default namespace. In other words did this happen because of the namespace change?

well was trying only in the namespace whitenfv, didnt try on the default namespace... assumed it worked all right and jumped into this test,

if i try nslookup inside zoo-0 pod

nslookup kafka-0.kafka.whitenfv.svc.clust>
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	kafka-0.kafka.whitenfv.svc.cluster.local
Address: 10.244.0.191

so it works, weird thing if i do the same with the zookeeper pod it wont work

also if i nslookup kafka-0.kafka, or nslookup zookeeper, but if i nslookup zoo-0.zookeeper so the problem is with my labels, selectors still i think...

@solsson okay, finally fixed the problem, it was indeed dnd errors in the labels and selectors of the services for the dns, so the first kafka pod schedules and runs okay, now getting this error on kafka-1

[2018-10-16 17:13:30,729] FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentBrokerIdException: Configured broker.id 1 doesn't match stored broker.id 0 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
	at kafka.server.KafkaServer.getBrokerIdAndOfflineDirs(KafkaServer.scala:628)
	at kafka.server.KafkaServer.startup(KafkaServer.scala:201)
	at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
	at kafka.Kafka$.main(Kafka.scala:92)
	at kafka.Kafka.main(Kafka.scala)
[2018-10-16 17:13:30,755] INFO shutting down (kafka.server.KafkaServer)
[2018-10-16 17:13:30,765] INFO Terminate ZkClient event thread. (org.I0Itec.zkclient.ZkEventThread)
[2018-10-16 17:13:30,777] INFO EventThread shut down for session: 0x1667dceb4710001 (org.apache.zookeeper.ClientCnxn)
[2018-10-16 17:13:30,777] INFO Session: 0x1667dceb4710001 closed (org.apache.zookeeper.ZooKeeper)
[2018-10-16 17:13:30,782] INFO shut down completed (kafka.server.KafkaServer)
[2018-10-16 17:13:30,788] FATAL Exiting Kafka. (kafka.server.KafkaServerStartable)
[2018-10-16 17:13:30,789] INFO shutting down (kafka.server.KafkaServer)

also checked the logs on zookeeper-0, 1 and 2, everything its working okay and they managed to elect a master, now the kafka-1 tries to conect to the master zookeeper and this happens.

how did u fix it? what was the issues with labels and selectors?

This occurs when the name of the headless service and the serviceName in the statefulset don't match exactly. In my example, one was "zoo" and the other "pzoo"

Unfortunately, having the same issue.

Same issue, Each pod refuses to resolve the hostname. Resolution for this issue is not clear. Please provide clear steps on how you resolved this issue.
Also share version of nodes, pods, coredns, what network add on your using, etc.

The same problem ... Help others please @paltaa

Hey! This was 4 years ago, don't have the code for it and it was the selectors as I said before, there are working helm charts that need little to no configuration like this one: https://github.com/bitnami/charts/tree/master/bitnami/kafka

@paltaa Other ways like Helm & Operators ( Strimzi, Confluent, ... ) are easy as always. Buy build cluster using manifests is better in some situations.
This is a good repository with low-quality documentation.

Well, be sure to check on labels, selectors and config map to match those DNS and it should work