Zipkin Storage: Kafka [EXPERIMENTAL]

Kafka-based storage for Zipkin.

                    +----------------------------*zipkin*----------------------------------------------
                    |                                     [ dependency-storage ]--->( dependencies      )
                    |                                                  ^        +-->( autocomplete-tags )
( collected-spans )-|->[ partitioning ]   [ aggregation ]    [ trace-storage ]--+-->( traces            )
  via http, kafka,  |       |                    ^    |         ^      |        +-->( service-names     )
  amq, grpc, etc.   +-------|--------------------|----|---------|------|-------------------------------
                            |                    |    |         |      |
----------------------------|--------------------|----|---------|------|-------------------------------
                            +-->( spans )--------+----+---------|      |
                                                      |         |      |
*kafka*                                               +->( traces )    |
 topics                                               |                |
                                                      +->( dependencies )

-------------------------------------------------------------------------------------------------------

Spans collected via different transports are partitioned by traceId and stored in a partitioned spans Kafka topic. Partitioned spans are then aggregated into traces and then into dependency links, both results are emitted into Kafka topics as well. These 3 topics are used as source for local stores (Kafka Stream stores) that support Zipkin query and search APIs.

Design

Configuration

Use-cases

Replacement for batch-oriented Zipkin dependencies

A limitation of zipkin-dependencies module, is that it requires to be scheduled with a defined frequency. This batch-oriented execution causes out-of-date values until processing runs again.

Kafka-based storage enables aggregating dependencies as spans are received, allowing a (near-)real-time calculation of dependency metrics.

To enable this, other components could be disabled. There is a profile prepared to enable aggregation and search of dependency graphs.

This profile can be enable by adding Java option: -Dspring.profiles.active=kafka-only-dependencies

Docker image includes a environment variable to set the profile:

MODULE_OPTS="-Dloader.path=lib -Dspring.profiles.active=kafka-only-dependencies"

To try out, there is a Docker compose configuration ready to test.

If an existing Kafka collector is in place downstreaming traces into an existing storage, another Kafka consumer group id can be used for zipkin-storage-kafka to consume traces in parallel. Otherwise, you can forward spans from another Zipkin server to zipkin-storage-kafka if Kafka transport is not available.

Building

To build the project you will need Java 8+.

make build

And testing:

make test

If you want to build a docker image:

make docker-build

Run locally

To run locally, first you need to get Zipkin binaries:

make get-zipkin

By default Zipkin will be waiting for a Kafka broker to be running on localhost:19092.

Then run Zipkin locally:

make run-local

To validate storage make sure that Kafka topics are created so Kafka Stream instances can be initialized properly:

make kafka-topics
make zipkin-test

This will start a browser and check a traces has been registered.

It will send another trace after a minute (trace timeout) + 1 second to trigger aggregation and visualize dependency graph.

Run with Docker

If you have Docker available, run:

make run-docker

And Docker image will be built and Docker compose will start.

To test it, run:

make zipkin-test-single
# or
make zipkin-test-distributed

Examples

Single-node: span partitioning, aggregation, and storage happening on the same containers.
Distributed-mode: partitioning and aggregation is in a different container than storage.
Only-dependencies: only components to support aggregation and search of dependency graphs.

Acknowledgments

This project is inspired in Adrian Cole's VoltDB storage https://github.com/adriancole/zipkin-voltdb

Kafka Streams images are created with https://zz85.github.io/kafka-streams-viz/

Artifacts

All artifacts publish to the group ID "io.zipkin.contrib.zipkin-storage-kafka". We use a common release version for all components.

Library Releases

Releases are at Sonatype and Maven Central

Library Snapshots

Snapshots are uploaded to Sonatype after commits to master.

Docker Images

Released versions of zipkin-storage-kafka are published to GitHub Container Registry as beta.zipkin.io/openzipkin-contrib/zipkin-storage-kafka. See docker for details.