Kafka Streams - Word count demo program

This is an example from the Kafka official documentation which demonstrates how to write a streaming application with the Kafka Streams client library.

The source code is found here.

Set up and run

Start the local Zookeeper and Kafka servers

Initiate a topic for the input stream

kafka-topics --create \
    --zookeeper 127.0.0.1:2181 \
    --replication-factor 1 \
    --partitions 1 \
    --topic kafka-stream-wordcount-input

Initiate a topic for the output stream

kafka-topics --create \
    --zookeeper 127.0.0.1:2181 \
    --replication-factor 1 \
    --partitions 1 \
    --topic kafka-stream-wordcount-output \
    --config cleanup.policy=compact

Run the word count demo Java application

mvn clean package
mvn exec:java -Dexec.mainClass=com.flyer.kafka.WordCountDemo

Start a producer to write data to the input stream topic

kafka-console-producer --broker-list 127.0.0.1:9092 --topic kafka-stream-wordcount-input

Start a consumer to observe results from the output stream topic

kafka-console-consumer --bootstrap-server 127.0.0.1:9092 \
	--topic kafka-stream-wordcount-output \
    --from-beginning \
    --formatter kafka.tools.DefaultMessageFormatter \
    --property print.key=true \
    --property print.value=true \
    --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \
    --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer

When writing data strings via the producer, the consumer should display word counts continuously in a streaming manner

Note

This demo application is written with Kafka Streams DSL. It's worthy to rewrite it using low level Processor API.
The default aggregate function count involves state store materialization, which has a commit frequency. Based on the commit.interval.ms config setting, there will be a delay in aggregation results in the demo. This is better explained in this SO entry and this article.

The semantics of caching is that data is flushed to the state store and forwarded to the next downstream processor node whenever the earliest of commit.interval.ms or cache.max.bytes.buffering (cache pressure) hits.

MaRuifeng/KafkaStreamDemo

Kafka Streams - Word count demo program

Set up and run

Note