/distributed-computing-spark

Slides and samples used in Distributed Computing with Spark talk.

Primary LanguageScalaApache License 2.0Apache-2.0

distributed-computing-spark

Slides and samples used in Distributed Computing with Spark talk.

Sample1 : Most retweeted

First example is a simple snippet used for guess the most retweeted tweet of a bunch of them. It also explore some options at deploying embeded Spark cluster and some basic features.

Sample2: Most retweeted (with SparkSQL)

Same example as before, but using SparkSQL syntax...

How to run

sbt run