/kafkastreamsdemo

Kafka Streams Demo code that was used during the Streaming webinar of the OpenCore/SAP webinar series.

Primary LanguageJavaApache License 2.0Apache-2.0

KafkaStreamsDemo

Kafka Streams Demo code that was used during the Streaming webinar of the OpenCore/SAP webinar series. This code will take tweets in Avro format from a Kafa topic and apply some stream processing rules to it to perform the following:

  • Raw word count - every occurrence of individual words is counted and written to the topic wordcount (a predefined list of stopwords will be ignored)
  • 5-Minute word count - words are counted per 5 minute window and every word that has more than 3 occurrences is written to the topic wordcount5m
  • Buzzwords - a list of special interest words can be defined and those will be tracked in the topic buzzwords

Setup

To run this code you need a machine or cluster of machines that is running the Confluent Platform. OpenCore also provide a virtual machine that canii be used to run this code without much work.

A Kafka Connect for Twitter that has been used as a base for this demo can be found at https://github.com/Eneco/kafka-connect-twitter - please configure this to write to the topic twitter.

To run the code, compile it into a fat jar file and deploy to any machine that has access to the Kafka cluster. Then run with:

java -cp KafkaStreamsDemo-1.0-SNAPSHOT-jar-with-dependencies.jar com.opencore.sapwebinarseries.KafkaStreamsDemo

To set the IP Adress of Zookeeper, Kafka and Schema Registry you can set the environment variable KAFKAIP, this will be picked up and used by the code. If nothing is set it will default to 127.0.0.1.

Getting the data

Accessing the data generated by the code is as simple as starting a console consumer which is shipped with Kafka.

kafka-console-consumer --topic wordcount --new-consumer --bootstrap-server KAFKAIP:9092 --property print.key=true
kafka-console-consumer --topic wordcount5m --new-consumer --bootstrap-server KAFKAIP:9092 --property print.key=true
kafka-console-consumer --topic buzzwords --new-consumer --bootstrap-server KAFKAIP:9092 --property print.key=true