Yolean/kubernetes-kafka

External access on AKS

mrfoh opened this issue · 13 comments

mrfoh commented

I have a kubernetes cluster running on AKS and want to configure it such that kafka-clients outside the cluster can access the kafka cluster

Following your guide, I have configured my outside services to use LoadBalancer service type and the same LoadBalancerIP.

I also updated the init.sh config in 10broker-config.yml to;

OUTSIDE_HOST=$(kubectl get svc outside-${KAFKA_BROKER_ID} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Looking at the Kubernetes dashboard, only one the services gets deployed

The /etc/kafka/server.properties on the corresponding pod shows

advertised.listeners=OUTSIDE://:32401,PLAINTEXT://:9092

I think i have a configuration issue somewhere, but not sure where.

Have you checked the logs of the init pod (kubectl logs -c init-config kafka-1) for error messages?

To test your shell commands for the init script you can add a tail -f /dev/null to the script and kubectl exec into the pod when kafka is in Init state.

If lookup succeeds you will have an annotation with the external address or name.

What guide are you talking about? I have the same problem (Not on AKS, but using HA private /public VPC on EC2).

I have set the outside access services to load balancer type, set a CNAME to my AWS load balancers.

I have set the init script to create:
OUTSIDE_PORT=3240${KAFKA_BROKER_ID}
OUTSIDE_HOST=broker${KAFKA_BROKER_ID}.my-domain:${OUTSIDE_PORT}
INSIDE_HOST=kafka-${KAFKA_BROKER_ID}.broker.kafka.svc.cluster.local:9092

I have checked the kafka logs and asserted the host names are correct per broker.

I can succesfully connect and consume messages internally with kafkacat from the test namespace.
I can successfully connect to the brokers externally with kafkacat. But When I attempt to consume messages (have not actually tried producing yet), I receive the error that each broker hostname is not resolving.

Is this anything to do with the fact that the load balancers have an IP per availability zone?

What guide are you talking about?

I suppose it refers to https://github.com/Yolean/kubernetes-kafka/tree/master/outside-services

I can successfully connect to the brokers externally with kafkacat. But When I attempt to consume messages (have not actually tried producing yet), I receive the error that each broker hostname is not resolving.

Could indicate that bootstrap gives you the internal addresses, etc. This is quite tricky, you might want to look into Kafka's docs on advertised listeners etc.

@mrfoh I'm trying to do the same, but the guide it's not enough clear for me, more than I have to change the type of outside-{0.1.2}.yml to LoadBalancer.

Could you please provide the final 10broker-config.yml file?

mrfoh commented

@marcosflobo Couldn't set this up eventually. We went with a managed Kafka system. I did find out that you can't use the same IP for multiple LoadBalancer services

Thanks @mrfoh for your quick answer.
I can tell you what I did:

  1. In each outside-{0,1,2}.yml file, change the type to LoadBalancer and remove nodePort: 3240* parameter.
  2. In 10borker.config.yml I've changed the line 31
    - OUTSIDE_HOST=$(kubectl get node "$NODE_NAME" -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
    + OUTSIDE_HOST=$(kubectl get svc outside-${KAFKA_BROKER_ID} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  3. I've re-deployed the whole kafka (since the config changed) and deploy the 3 outside YML. All in AKS (Azure Kubernetes Service)

This brings 3 outside services with External Endpoints (xxx.xxx.xxx.xxxx:32400, xxx.xxx.xxx.xxxx:32401 and xxx.xxx.xxx.xxxx:32402) pointing to each of the kafka-{0,1,2} pods. So far so good.

Now, I test from outside of the Kubernetes cluster:

  • Run my app to insert data in the Kafka base on "mytopic". This works.
  • I run bin/kafka-console-consumer.sh --bootstrap-server xxx.xxx.xxx.xxxx:32400, xxx.xxx.xxx.xxxx:32401 and xxx.xxx.xxx.xxxx:32402 --topic mytopic --from-beginning

The test is successfully.

It takes some time since the Kafka brokers sync, but those changes worked for me.

mrfoh commented

@marcosflobo Thanks this will come in handy

I would just use hostport and open the port on the machines. Then you don't need any load balancer for kafka and you just need to somehow find our what the IP/DNS name of the VM runing kafka is

Hi @Hermain could you please describe the steps and changes to do that? Thanks!

https://github.com/Yolean/kubernetes-kafka/tree/master/outside-services#outside-access-with-hostport
That's what I tried to do here.
Outside host is a script thats executed (you can use kubectl) to find the dns name of the kubernetes host. The code like that works with aws I don't know how you can get a nodes dns name (and if they have a dns name) on azure maybe IP will do too.

@Hermain I was gonna go the hostport route instead of nodeports, but GKE (version 1.11.8-gke.10) is putting all 3 of my brokers on the same node (I have 3 nodes in my cluster). I ran the deployment files in the order listed on the main README for this repo and even deleted the deployment and started over with creating the nodeport services before running the ./kafka yamls (at someone's suggestion).

I didn't make any modifications to the repo besides changing InternalIP to ExternalIP as many have in the init config map.

The zookeeper pods did schedule themselves correctly across multiple nodes (as do all my other deployments), so I don't know why the kafka one's are sticking to one consistently. Any ideas/tips to try?

If you use hostport with the same port than they can not be scheduled on the same host since the port is taken already. Otherwise you can use taints and tolerations. Usually though kubernetes spreads the load evenly. Possibly your other nodes don't have enough resources?

Thanks @mrfoh for your quick answer.
I can tell you what I did:

  1. In each outside-{0,1,2}.yml file, change the type to LoadBalancer and remove nodePort: 3240* parameter.
  2. In 10borker.config.yml I've changed the line 31
    • OUTSIDE_HOST=$(kubectl get node "$NODE_NAME" -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')
    • OUTSIDE_HOST=$(kubectl get svc outside-${KAFKA_BROKER_ID} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  3. I've re-deployed the whole kafka (since the config changed) and deploy the 3 outside YML. All in AKS (Azure Kubernetes Service)

This brings 3 outside services with External Endpoints (xxx.xxx.xxx.xxxx:32400, xxx.xxx.xxx.xxxx:32401 and xxx.xxx.xxx.xxxx:32402) pointing to each of the kafka-{0,1,2} pods. So far so good.

Now, I test from outside of the Kubernetes cluster:

  • Run my app to insert data in the Kafka base on "mytopic". This works.
  • I run bin/kafka-console-consumer.sh --bootstrap-server xxx.xxx.xxx.xxxx:32400, xxx.xxx.xxx.xxxx:32401 and xxx.xxx.xxx.xxxx:32402 --topic mytopic --from-beginning

The test is successfully.

It takes some time since the Kafka brokers sync, but those changes worked for me.

Hello, you are very helpful for me to configure public network access, thank you. But your configuration steps here are not complete, you also need to modify the RBAC configuration to increase the role access permissions of services, otherwise kafka will report the following error during init:
Error from server (Forbidden): services "outside-2" is forbidden: User "system:serviceaccount:kafka:default" cannot get resource "services" in API group "" in the namespace "kafka"

Increase the configuration of RBAC:
vim rbac-namespace-default/pod-labler.yml # to add services read role
resources:
-pods
-services # <--- add this