Follow this article to find more detailed instructions.
Modify the class "MainExample.scala" writing your Spark code, then compile the project with the command:
mvn clean package
Inside the /target
folder you will find the result fat jar called spark-scala-maven-project-0.0.1-SNAPSHOT-jar-with-depencencies.jar
. In order to launch the Spark job use this command in a shell with a configured Spark environment:
spark-submit --class com.examples.MainExample \
--master yarn-cluster \
spark-scala-maven-project-0.0.1-SNAPSHOT-jar-with-depencencies.jar \
inputhdfspath \
outputhdfspath
The parameters inputhdfspath
and outputhdfspath
don't have to present the form hdfs://path/to/your/file
but directly /path/to/your/files/
because submitting a job the default file system is HDFS. To retrieve the result locally:
hadoop fs -getmerge outputhdfspath resultSavedLocally