Apache Spark Examples

Prerequisite

JAVA Version 7
SBT Version 0.13.8
SCALA Version 2.10.5
SPARK Version 1.3.1

Basic Map Function

Basic Average with Aggregate Function

WordCount Example -- No Dependencies -- Assembly not required.

spark-submit \
--class com.wordcount.example.WordCount \
--driver-memory "3g" \
target/scala-2.10/dfw-spark-meetup_2.10-0.0.1.jar \
/Users/shona/IdeaProjects/apache-spark-examples/data/pg4300.txt \
/Users/shona/IdeaProjects/apache-spark-examples/data/wordcount

Top 5 User Count Stackoverflow -- No Dependencies -- Assembly not required.

spark-submit \
--class com.stackoverflow.example.UserCount \
--master "local[1]" \
--driver-memory "3g" \
target/scala-2.10/dfw-spark-meetup_2.10-0.0.1.jar \
/Users/shona/IdeaProjects/apache-spark-examples/data/Users.xml

Naive Bayes Tweet Sentimental Analysis -- Lucene Text Analyzer -- Assembly required.

spark-submit \
--class com.twitter.example.NaiveBayesClassifier \
--driver-memory "5g" \
target/scala-2.10/dfw-spark-meetup-assembly-0.0.1.jar

Twitter Streaming -- Apache Spark Example -- Printing on Console

Run from Spark Home Directory.

spark-submit \
--class org.apache.spark.examples.streaming.TwitterPopularTags \
--master "local[2]"  \
--driver-memory "3g" \
lib/spark-examples-1.3.1-hadoop2.6.0.jar \
Consumer Key , Consumer Secret , Access Token ,Access Token Secret

Explain and Discuss Aggregate Function

val numbers = sc.parallelize(List(1,2,3,4,5,6), 2)
numbers.aggregate(0)(math.max(_, _), _ + _)

Explain and Discuss RDD and toDebugString

val file = sc.textFile("README.md")
val containsSpark = file.filter(line => line.contains("Spark"))
val words = containsSpark.flatMap(line => line.split(" "))
val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
counts.toDebugString
counts.collect()

Explain and Discuss Accumulator

val accum = sc.accumulator(0, "Test Accumulator")
sc.parallelize(Array(7,8, 9, 10)).foreach(x => accum += x)

Explain and Discuss Broadcast

val broadcastVar = sc.broadcast(Array(1, 2, 3))
broadcastVar.value

Sample Data Download Links