/bio-app

An application for DNA sequence analysis written in Scala and deployed using Apache Spark&Hadoop.

Primary LanguageScala

bio-app

General informations

Table of contents

Prerequisites

Building an application

To build an application, the Scala Build Tool should be installed. Check the installed version of SBT by command:

sbt --version

Later, go to the project's main directory and type:

sbt assembly

Dependencies

To be done.

Deploying an application to Apache Spark

Linux

To create a standalone cluster on Linux, use:

# Start master node
./start-master.sh

# Start worker node
./start-slave.sh <master-URL>

Windows

According to Apache Spark documentation:

"The launch scripts do not currently support Windows.
To run a Spark cluster on Windows, start the master and workers by hand.
"

To create a standalone cluster on Windows, use:

# Start master node
spark-class org.apache.spark.deploy.master.Master

# Start worker node
spark-class org.apache.spark.deploy.worker.Worker <master-URL>

Usage

To run an application using Apache Spark, use spark-submit script:

spark-submit --class "app.BioApp" --master <master-URL> <path-to-JAR>

Additional flags which could be useful while running an application:

--verbose, -v - enable debug output
--total-executor-cores [NUM] - total number of executors
--executor-cores [NUM] - number of cores used by each executor

For other options, see spark-submit --help.

Configuration

To be done.

Monitoring

Information about running nodes are available in a browser on <master-URL>, which was displayed during starting the master node.

localhost:<master-port>

By default, it is binded to the port localhost:8080


To access Spark Web UI and display information about jobs, stages, storage, etc., open a browser and go to:

localhost:4040

If you have more than one application up at the same time, they are binded to the subsequent ports: localhost:4041, localhost:4042, and so on.

Note: this UI is available only when the application is running. To restore UI from already finished applications, see Monitoring and instrumentation page.


Web UI for HDFS management is accesible via:

localhost:9870

Troubleshooting

The following problems may occur while submitting the application to Apache Spark:

Issue 1 - Failed to load class [class name]

Solution

Ensure that correct class and package names are given as arguments to the spark-submit and that chosen class has main function implemented.

Issue 2 - Connection refused: localhost/[ip_addres]:[port]

The most common reason for this issue is IP address mismatch between the value given in SparkController.scala while building SparkSession and the value of env variable SPARK_LOCAL_IP set in ${SPARK_HOME}/conf/spark-env.sh

Issue 3 - Did not find winutils.exe (Windows only)

Solution

Step 1. Install winutils.exe file from a directory dedicated for used Apache Spark version from this website.
Step 2. Create a directory C:\\Program Files\Hadoop\bin and place the winutils.exe file inside.

Issue 4 - java.lang.NoSuchMethodError: com.google.common.hash.Hasher.putUnencodedChars while running MHAP

This issue occurs when your Apache Spark uses too low version of Guava library. It could be checked in Spark WebUI, in Environment > Classpath Entries tab.

Method putUnencodedChars was added to Gauva in release 15.

Solution

The most straightforward solution is to download JAR of the latest version of Guava and replace the old one in the ${SPARK_HOME}\jars directory.

Examples

To run one of the provided examples, build an application according to instructions above and use:

spark-submit --class "examples.<example-name>" --master <master-URL> <path-to-JAR>

References

Apache Spark documentation
sbt Reference Manual
Scala documentation

Return to the top