- download data to
$DATA_HOME
- install sbt (http://www.scala-sbt.org/)
- clone the repository, then it might be opened in Idea 13+ as Scala sbt project (look at wiki).
The structure of the project is described here http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications.
git clone <repo>
- go to the directory where you cloned repository and enter
sh2016
dircd $PROJECT_HOME/sh2016
- build project
sbt package
- download spark (http://spark.apache.org/downloads.html). Recommended version spark-1.6.0-bin-hadoop2.6.tgz.
- go to the directory where you downloaded spark
cd $SPARK_HOME
- unpack spark
spark tar -xvzf <spark>.tgz
- send jar you made in step 5 to spark (configuration is given for 4 cores)
$SPARK_HOME/bin/spark-submit --class "Baseline" --master local[4] --driver-memory 4G $PROJECT_HOME/sh2016/target/scala-2.10/baseline_2.10-1.0.jar $DATA_HOME