/spark-twitter-stream-example

"Sentiment analysis" on a live Twitter feed with Apache Spark and Apache Bahir

Primary LanguageScala

Spark Twitter Stream Example

A few lines of code to demo how streaming works with Spark, in particular using the extensions provided by Apache Bahir to read a live stream of tweets, which will be processed to assign it a sentiment score (using a very naive algorithm).

To make it work on your installation, be sure to add a twitter4j.properties under src/main/resources that includes the following information:

oauth.consumerKey=***
oauth.consumerSecret=***
oauth.accessToken=***
oauth.accessTokenSecret=***

Visit apps.twitter.com to get your own API keys.

To submit the job to an existing Spark installation you can package the job with the following command:

sbt package

and then submit it with the following command:

$SPARK_HOME/bin/spark-submit \
  --master $SPARK_MASTER \
  --jars $DEPENDENCIES \
  --class me.baghino.spark.streaming.twitter.example.TwitterSentimentScore \
  target/scala-2.11/spark-twitter-stream-example-assembly-1.0.0.jar

The Spark classpath should include org.apache.bahir:spark-streaming-twitter_2.11:2.0.1, org.twitter4j:twitter4j-core:4.0.4 and org.twitter4j:twitter4j-stream:4.0.4.

After running the sbt package command you'll find the required JARs in your local Ivy cache ($HOME/.ivy2/cache/).