/kafka-surveyor

A collection of scripts for generating a plot showing the overall Kafka cluster topology 📐

Primary LanguagePowerShellMIT LicenseMIT

Kafka Surveyor

A collection of scripts for generating a plot showing the overall Kafka cluster topology of services deployed on a Kubernetes cluster.

Kafka Surveyor Graph

Approach 1: Consumer-Side Topology

Usage

  1. Ensure you can connect to your Kafka cluster using the kafka-consumer-group.sh script that comes with Kafka, and that thet script is available on the system path.
  2. Set the environment variable KAFKA_BOOTSTRAP_SERVER to the bootstrap server for communicating with Kafka. For example, 123.12.12.123:9320
  3. Generate the consumer topology in JSON format by running the following command from this repository's root directory in a Linux shell:
    > ./scripts/approach-1/kafka-consumer-topology.sh | tee consumer-topology.sh
  4. Visualize the topology at https://yongjie.codes/kafka-surveyor.

Explanation

This approach produces the mapping of consumer group IDs to Kafka topics subscribed by comsumer group. It is a straightforward usage of the kafka-consumer-group.sh script that comes with Kafka to query the Kafka cluster for a list of consumer groups and the topics that each group subscribes to.

Note that this approach is unable to provide information on what services are producing to which Kafka topics (see approach 2 below for that).

Approach 2: Overall Topology (Java-only)

Note: The scripts may take rather long to run as it is querying each service sequentially.

Usage

  1. Ensure you can connect to your Kubernetes cluster using the kubectl command.

  2. Ensure that you can connect to the services that you are interested in via JMX using Jmxterm (see the Explanation section below for details and sample command).

  3. Ensure the environment variable JMXTERM_PATH is set to the path to the Uber JAR for Jmxterm.

  4. Generate the overall topology in JSON format by running the following command from this repository's root directory

    • in a Linux shell:

      > ./scripts/approach-2/kafka-producer-topology.sh |
      tee producer-topology.json
      > ./scripts/approach-2/kafka-consumer-topology.sh |
      tee consumer-topology.json
    • or in a Windows PowerShell:

      $ .\scripts\approach-2\kafka-producer-topology.ps1 |
      Tee-Object -FilePath producer-topology.json
      $ .\scripts\approach-2\kafka-consumer-topology.ps1 |
      Tee-Object -FilePath consumer-topology.json
  5. Combine the two JSON files and visualize the topology at https://yongjie.codes/kafka-surveyor.

Explanation

This approach produces mappings of:

  1. Service name to Kafka topics that the service is consuming from, and
  2. Service name to Kafka topics that the service is producing to,

allowing us to plot the overall topology.

The "trick" to this approach is to realize that the official Kafka producer and consumer clients exposes certain metrics via JMX (see Wikipedia article for an overview), and by querying these metrics, we can associate a particular service to the Kakfa topics that the service is producing to / consuming from.

In particular, this approach is broken down into two main steps:

  1. For each service that we are interested in, obtain the IP address (or IP addresses, if there may be multiple instance of each service).

    For services running on Kubernetes, the IP address may be obtained using a variant of the following command:

    > kubectl get pods --selector="app=<your-app-label-here>" \
    --output custom-columns="IP:.status.podIP"
  2. For each IP address, obtain the list of topics that the service at that IP address is producing to / consuming from.

    This may be achieved with Jmxterm, using commands like the following:

    # Creating a command file so Jmxterm might be runned non-interactively.
    > echo "beans" > jmxterm-commands.in
    
    # Using Jmxterm to query the mbeans exposed via JMX.
    > java -jar <path-to-jmxterm-uber-jar-file> --noninteract --verbose silent \
     --url "<your-service-IP-here>:9010" --input jmxterm-commands.in
    

    To filter the output from the above command to only lines containing the Kafka topics, simply pipe it to grep "producer-topic-metrics" and grep "consumer-fetch-manager-metrics" to get the Kafka topcis that the service is producing to / consuming from respectively.

TODO:

  1. Add code and documentation for d3.js visualization.
  2. Find a way to map the Kafka topology of services running in Go.
  3. Visually differentiate producers, consumers, and topics on the d3.js visualization.