/capillary

Storm Spout + Kafka State Inspector

Primary LanguageScalaMIT LicenseMIT

Capillary is a small web application that displays the state and deltas of Kafka-based storm topologies with a Kafka >= 0.8. It also provides an API for fetching this information for monitoring purposes.

Overview

delta

delta

topics

topics

delta history

delta_change

statistics of offset

statistic

Capillary does the following:

  • Fetches information about running topologies from Zookeeper
  • Fetches information about the topic's partitions and offsets from the Storm spout state in Zookeeper
  • Fetches all topics of kafka cluster
  • Fetches offset, ISR, replicas, leader, offset increment for each partition exists in Kafka
  • Fetches partition leader distribution of Kafka cluster
  • Show Storm cluster summary
  • Show Storm configuration
  • Show delta change history

API

You can hit /api/status?toporoot=$TOPOROOT&topic=$TOPIC to get a JSON output.

Name

At Keen IO we name our projects after plants. Services often use tree or large plant names and large infrastructure projects may use more general names.

Capillary action moves water through a plant's xylem. Since Kafka and Storm handle the flow of data (water) through our infrastructure this seems a fitting name!

Running It

Wanna run it? Awesome! This is a Play Framework app so it works the way all other Play apps do:

$ sbt universal:package-zip-tarball
… lots of crazy computer talk …
[success] Total time: 4 s, completed Aug 8, 2014 2:04:06 PM

You should find a tarball at target/universal/capillary-VERSION.tgz.

That there tgz file can be untarballerated (or whatever the technical term is for that) and run like so:

tar xzf capillary-1.2.tgz
cd capillary-1.2
bin/capillary
Play server process ID is 24223
[info] play - Application started (Prod)
[info] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9000

Then you can hit that URL via your web browser with some args:

http://127.0.0.1:9000

Capillary will try and dig information out of Zookeeper to show what topologies you have running.

Configuration

By way of Play Capillary uses Typesafe Config so you have lots of options.

Most simply you can specify properties when starting the app or include conf/application-local.conf at runtime. Here's our runtime properties:

bin/capillary -Dcapillary.zookeepers=sj01-prod-zk-0000:2181,sj01-prod-zk-0001:2181,sj01-prod-zk-0002:2181 -Dcapillary.kafka.zkroot=kafka8 -Dcapillary.storm.zkroot=keen_storm

Here are the configuration options:

capillary.zookeepers

Set this to a list of your ZKs like every other ZK thingie it uses something like host1:2181,host2:2181.

capillary.kafka.zkroot

If your Kafka chroots to a subdirectory (or whatever it's called) in Zookeeper then you'll want to set this. We use 'kafka8' after upgrading from 0.7.

capillary.storm.zkroot

If your Storm chroots to a subdirectory (or whatever it's called) in Zookeeper then you'll want to set this. We use 'keen-storm'.

capillary.use.trident

Set this to true if you would like to monitor trident topologies instead of pure storm. This will modify the Zookeeper paths used to match Trident's convention.

Other Notes

  • Uses Scala 2.10.4 because Kafka (at the time of this writing) doesn't have 2.11 artifacts and explodes
  • Doesn't use watches, so updates every time you refresh the page

Structure

Storm's Kafka spout stores it's committed state in a ZK structure like this:

/$SPOUT_ROOT/$CONSUMER_ID/$PARTITION_ID

Each $PARTITION_ID stores state as a JSON object that looks like:

{
  "topology": {
    "id": "$SOME_UNIQUE_ID",
    "name": "$TOPOLOGY_NAME"
  },
  "offset": $CURRENT_OFFSET,
  "partition": $PARTITION_NUMBER,
  "broker": {
    "host": "$HOST_ADDRESS",
    "port": 9092
  },
  "topic": "$TOPIC_NAME"
}