Basic sentiment analysis of realtime tweets using Apache Kafka - queuing service for data streams.

Initialization Steps:

Download and extract the twitter data zip file:

Download the file from here :


Start zookeeper service:

$KAFKA_HOME/bin/ $KAFKA_HOME/config/

Start kafka service:

$KAFKA_HOME/bin/ $KAFKA_HOME/config/

Create a topic named twitterstream in kafka:

$KAFKA_HOME/bin/ --create --zookeeper localhost:2181 --replication-factor 1 -partitions 1 --topic twitterstream

Check what topics you have with:

$KAFKA_HOME/bin/ --list --zookeeper localhost:2181

Using the Streaming API:

In order to stream the tweets and push them to kafka queue, we have provided a python script

To stream tweets, we will read tweets from a file and push them to the twitterstream topic in Kafka. Do this by running our program as follows:

$ python

Note, this program must be running when you run your portion of the assignment, otherwise you will not get any tweets.

To check if the data is landing in Kafka:

$KAFKA_HOME/bin/ --zookeeper localhost:2181 --topic twitterstream --frombeginning

Running the Stream Analysis Program (after finishing the project requirements):

$SPARK_HOME/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0