SparkCruise is an automatic computation reuse system developed for Spark. It can automatically detect overlapping computations in the past query workload and enable automatic materialization and reuse in future Spark SQL queries.
- Query workload
HDInsight clusters are pre-installed with SparkCruise library and configuration options.
Please refer to the detailed documentation here.
We have also developed Workload Insights Notebook (WIN) to help users derive insights from their query workloads. To use WIN run the analyze command - sudo /opt/peregrine/analyze/peregrine.sh analyze
and then import the notebook (available here) in Jupyter.
SparkCruise and Workload Insights Notebook Demo at Spark+AI Summit 2020.