In Spark Summit 2014 [2], there was a hand-on exercise on Streaming with Twitter. I am reproducing this exercise using the latest versions at present (May-2016).
This is the (almost) complete source code (I finished the missing part "Your code goes here" in the exercise). All one needs to do now is to fill in Twitter's credentials [6]. The library dependencies (Twitter4J, spark-streaming-twitter) are also included here for convenience.
There are two ways to reproduce:
-
By Intellij: Open the project (*.iml), reference to "spark-assembly-1.6.1-hadoop2.6.0.jar", 3rd party libraries (spark-streaming-twitter_2.10-1.6.1.jar, twitter4j-async-4.0.4.jar, twitter4j-core-4.0.4.jar, twitter4j-examples-4.0.4.jar, twitter4j-media-support-4.0.4.jar, twitter4j-stream-4.0.4.jar). Then Run.
-
By Command Line: Build the artifact (SparkStreamingWithTwitter.jar), then use the command: spark-submit.cmd --class "Tutorial" --jars "spark-streaming-twitter_2.10-1.6.1.jar,twitter4j-async-4.0.4.jar,twitter4j-core-4.0.4.jar,twitter4j-examples-4.0.4.jar,twitter4j-media-support-4.0.4.jar,twitter4j-stream-4.0.4.jar" "SparkStreamingWithTwitter.jar".
Note: In case those additional jars cannot be found, use absolute paths!
-
Apache Spark v1.6.1
-
IntelliJ IDEA Community Edition 2016.1.2 (with Scala plugin)
-
Scala v2.10.5
-
Java SDK v1.8.0_92 64-bit
-
spark-streaming-twitter_2.10-1.6.1.jar [5]
-
Twitter4J v4.0.4 [4]
-
Windows 8.1 64-bit