/spark-flamegraph

Easy CPU Profiling for Apache Spark applications

Primary LanguageShellApache License 2.0Apache-2.0

spark-flamegraph

Easy CPU Profiling for Apache Spark applications.

The script spark-submit-flamegraph is a wrapper around standard spark-submit that generates Flame Graph.

Supported Systems

  • Amazon EMR
  • Most Linux distributions
  • Mac (with Homebrew installed)

Prerequisites

The script is adapted for work in Amazon EMR. Otherwise the following utilities must present on your system:

  • perl
  • python2.7
  • pip

Running

wget -O /usr/local/bin/spark-submit-flamegraph \
  https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph

chmod +x /usr/local/bin/spark-submit-flamegraph

Use spark-submit-flamegraph as a replacement for the spark-submit command.

To tweak Spark command used for running an application set SPARK_CMD environment variable, for instance to run spark-shell use:

SPARK_CMD=spark-shell /usr/local/bin/spark-submit-flamegraph

Details

The script does the following operations to make profiling Spark applications as easy as possible:

  • Downloads InfluxDB, and starts it on some random port.
  • Starts Spark application using original spark-submit command, with the StatsD profiler Jar in its classpath and with the configuration that tells it to report statistics back to the InfluxDB instance.
  • After running Spark application, queries all the reported metrics from the InfluxDB instance.
  • Run a script that generates the .SVG file.
  • Stops the InfluxDB instance.