To build an application, the Scala Build Tool should be installed. Check the installed version of SBT by command:
sbt --version
Later, go to the project's main directory and type:
sbt assembly
To be done.
To create a standalone cluster on Linux, use:
# Start master node
./start-master.sh
# Start worker node
./start-slave.sh <master-URL>
According to Apache Spark documentation:
"The launch scripts do not currently support Windows.
To run a Spark cluster on Windows, start the master and workers by hand."
To create a standalone cluster on Windows, use:
# Start master node
spark-class org.apache.spark.deploy.master.Master
# Start worker node
spark-class org.apache.spark.deploy.worker.Worker <master-URL>
To run an application using Apache Spark, use spark-submit
script:
spark-submit --class "app.BioApp" --master <master-URL> <path-to-JAR>
Additional flags which could be useful while running an application:
--verbose, -v - enable debug output
--total-executor-cores [NUM] - total number of executors
--executor-cores [NUM] - number of cores used by each executor
For other options, see spark-submit --help
.
To be done.
Information about running nodes are available in a browser on <master-URL>
, which was displayed during starting the master node.
localhost:<master-port>
By default, it is binded to the port localhost:8080
To access Spark Web UI and display information about jobs, stages, storage, etc., open a browser and go to:
localhost:4040
If you have more than one application up at the same time, they are binded to the subsequent ports: localhost:4041
, localhost:4042
,
and so on.
Note: this UI is available only when the application is running. To restore UI from already finished applications, see Monitoring and instrumentation page.
Web UI for HDFS management is accesible via:
localhost:9870
The following problems may occur while submitting the application to Apache Spark:
Ensure that correct class and package names are given as arguments to the spark-submit
and that chosen class has main
function implemented.
The most common reason for this issue is IP address mismatch between the value given in SparkController.scala
while building SparkSession and the value of env variable SPARK_LOCAL_IP
set in ${SPARK_HOME}/conf/spark-env.sh
Step 1. Install winutils.exe
file from a directory dedicated for used Apache Spark version from this website.
Step 2. Create a directory C:\\Program Files\Hadoop\bin
and place the winutils.exe
file inside.
Issue 4 - java.lang.NoSuchMethodError: com.google.common.hash.Hasher.putUnencodedChars while running MHAP
This issue occurs when your Apache Spark uses too low version of Guava library.
It could be checked in Spark WebUI, in Environment > Classpath Entries
tab.
Method putUnencodedChars
was added to Gauva in release 15.
The most straightforward solution is to download JAR of the latest version of Guava
and replace the old one in the ${SPARK_HOME}\jars
directory.
To run one of the provided examples, build an application according to instructions above and use:
spark-submit --class "examples.<example-name>" --master <master-URL> <path-to-JAR>
Apache Spark documentation
sbt Reference Manual
Scala documentation