This plugin allows context aware profiling of a spark application. It collects profiling data both from the driver and the executors, to get a detailed view (flame graphs) of the application's cpu usage, memory allocations, lock contention, etc. The implementation uses 2 components:
- Pyroscope Java agent - to collect the profiling samples from the java processes and report them to a Pyroscope server.
- Apache Spark plugin - custom plugin implementation for starting the agent on every node, while attaching a context to the profiling data in runtime. It's using the plugin methods onTaskStart, onTaskSucceeded & onTaskFailed to mark the execution flow with the relevant context labels for the profiler: executor, stage and partition.
The result allows investigating the profiling data in multiple ways - either as a whole, using an aggregated view of the entire app, or as some zoom-in view by breaking it down by executor, stage or partition.
- Download the released plugin jar.
- Submit your spark job including the plugin configurations. e.g:
./spark-submit \ ... your arguments ... \ --jars <YOUR_PATH_TO_THE_PLUGIN_JAR_LOCATION>/spark_profiling_plugin-1.0.0-jar-with-dependencies.jar \ --conf spark.plugins='com.github.tomsisso.spark.plugins.profiling.SparkProfilingPlugin' \ --conf spark.plugins.profiling.plugin.server.address='http://<YOUR_PYROSCOPE_SERVER>:4040' \ --conf spark.plugins.profiling.plugin.upload.interval.seconds=10 \ /<YOUR_APP>.jar
This repo includes an E2E demo environment including docker-compose.yml with the relevant Spark, Pyroscope and Grafana dockers (& with a predefined grafana dashboard).
Start here