An example Spark application with SBT configuration and unit tests. Implements the obligatory "Word Count" example in Spark.
/build.sbt
- SBT configuration for the project. Includesspark-core
andscalatest
as dependencies. Setsfork := true
, to work around an issue with the class loader when running or testing Spark applications in SBT./project/build.properties
- Sets the SBT version to use./src/main/resources/log4j.properties
- Sets logging level toWARN
, to minimise log output when running or testing Spark applications./src/main/scala/SparkFixtures.scala
- Fixtures for creating aSparkContext
, passing it into a closure, and then stopping theSparkContext
once the closure has finished. Re-used in application code and tests./src/main/scala/WordCount.scala
- Pure function onRDD
s for counting the occurrences of words in a multi-line input./src/main/scala/WordCountApp.scala
- Application that accepts a file name as a command-line argument, starts aSparkContext
, reads the file, calculates the word count, and prints the results tostdout
. It can run in local mode with hard-coded configuration (master = "local[*]"
), or remote mode with configuration supplied from the environment orspark-submit
./src/test/scala/WordCountSpec.scala
- Tests the pure code inWordCount
using ScalaTest and the fixtures inSparkFixtures
.
To compile:
sbt compile
To run tests:
sbt test
To run locally:
sbt 'run --local FILE'
To run remotely:
sbt package
spark-submit target/scala-2.11/word-count_2.11-1.0.jar --remote README.md