scala-kafka-twitter: A Scala repository from dhyaneshm

Example project to integrate Kafka, Avro and Spark Streaming with Twitter as a stream source

This is WIP.

Current infrastructure:

Tweets are serialized to Avro (without code generation) and sent to Kafka
A Kafka consumer picks up serialized Tweets and prints them to stdout

How to run

Get Twitter credentials and fill them in reference.conf.
Start Kafka (instructions) in single-node mode on localhost
Start Kafka producer

./gradlew produce

This will start to read recent tweets, encode them to Avro and send to the Kafka cluster in binary format (Array[Byte]).

Start Kafka consumer

 ./gradlew consume

This will run Spark streaming connected to the Kafka cluster. In 5-second intervals the program reads Avro tweets from Kafka, deserializes the tweet texts to strings and print 10 most frequent words

dhyaneshm/scala-kafka-twitter

Example project to integrate Kafka, Avro and Spark Streaming with Twitter as a stream source

How to run