/spark-scala-sbt-template

Simple template to get up and running with a spark-submit job in the cluster in under 10 mins :P

Primary LanguageScalaMIT LicenseMIT

Spark-Scala Sbt Template

This is a sample Spark job written in Scala running computation on dummy sales data. The sales data is provided as a CSV file. Check how to run locally below to trigger a test run.

Requirements:

scala, sbt

Configuration

default settings in build.sbt file, change according to your project.
name := "SampleSparkSubmitTemplate",
version := "2.0",
scalaVersion := "2.12.0",
assemblyJarName in assembly := "Adnan.jar",
mainClass in Compile := Some("com.adnan.emr")

How to compile:

  1. git clone https://github.com/adnanalvee/Spark-Scala-Sbt-Template.git
  2. Open with any text editor or IDE of your choice and start writing code.
  3. Once done, open a terminal and type sbt assembly .
  4. Your required JAR will be now in the target folder of the repo.

How to run as is locally (using homebrew for MacOS):

  1. Install homebrew in mac
  2. Run 'brew install apache-spark' (This will install apache spark locally in your desktop)
  3. Go to the code directory > target/scala-2.12
  4. Run the following command from the directory changing the location parameters as per your environment. spark-submit
    --master local
    --driver-memory 2g
    --executor-memory 2g
    --class com.adnan.emr.ProfitCalc
    /Users/yourprofile/prefix/spark-scala-sbt-template-master/target/scala-2.12/SparkProfitCalc.jar
    /Users/yourprofile/prefix/spark_launcher/bin/spark_code/spark-scala-sbt-template-master/target/scala-2.12/sales_data.csv
    /Users/yourprofile/prefix/spark_launcher/bin/spark_code/spark-scala-sbt-template-master/target/scala-2.12/output
  5. Once the code runs successfully, you find the output in the output directory under target/scala-2.12.