This is a sample Spark job written in Scala running computation on dummy sales data. The sales data is provided as a CSV file. Check how to run locally below to trigger a test run.
scala, sbt
default settings in build.sbt file, change according to your project.
name := "SampleSparkSubmitTemplate",
version := "2.0",
scalaVersion := "2.12.0",
assemblyJarName in assembly := "Adnan.jar",
mainClass in Compile := Some("com.adnan.emr")
git clone https://github.com/adnanalvee/Spark-Scala-Sbt-Template.git
- Open with any text editor or IDE of your choice and start writing code.
- Once done, open a terminal and type
sbt assembly
. - Your required JAR will be now in the target folder of the repo.
- Install homebrew in mac
- Run 'brew install apache-spark' (This will install apache spark locally in your desktop)
- Go to the code directory > target/scala-2.12
- Run the following command from the directory changing the location parameters as per your environment.
spark-submit
--master local
--driver-memory 2g
--executor-memory 2g
--class com.adnan.emr.ProfitCalc
/Users/yourprofile/prefix/spark-scala-sbt-template-master/target/scala-2.12/SparkProfitCalc.jar
/Users/yourprofile/prefix/spark_launcher/bin/spark_code/spark-scala-sbt-template-master/target/scala-2.12/sales_data.csv
/Users/yourprofile/prefix/spark_launcher/bin/spark_code/spark-scala-sbt-template-master/target/scala-2.12/output
- Once the code runs successfully, you find the output in the output directory under target/scala-2.12.