Status |
---|
File/folder | Description |
---|---|
.github |
Github specific configuration |
.gitignore |
Define what to ignore at commit time. |
aks-spark-chart |
Helm Charts |
benchmark |
Benchmark test code |
docs |
Project documentation |
env |
Terraform to build environment |
results |
Benchmark results images |
spark |
Spark Docker containers and config |
CODE_OF_CONDUCT.md |
Code of Conduct for this project |
CONTRIBUTING.md |
Guidelines for contributing to the sample. |
CHANGELOG.md |
List of changes to the sample. |
LICENSE |
The license for the sample. |
README.md |
This README file. |
SECURITY.md |
This SECURITY file. |
SUPPORT.md |
The SUPPORT policy for this project file. |
This project requires the user to have access to the following:
- An Azure AAD Tenant and the ability to create AAD Applications
- An Azure Subscription
This project also requires a development environment with the following tools installed
TPC-DS is an industry-standard benchmark developed by the Transaction Processing Performance Council (TPC). It is used to measure the performance of decision support solutions. The benchmark specification and provided tools may be accessed at www.tpc.org.
This project implements a derivative of TPC-DS benchmark executed using Databricks sql perf libraries. In this derivative benchmark, we evaluated and measured the performance of Spark SQL on Azure Kubernetes (AKS). Our tests was limited to q64-v2.4, q70-v2.4, q82-v2.4 queries.
Follow the steps described in the quick start guide to setup and run the benchmark
Benchmark test was executed on 2 different types of Node sizes.
Node size | Node count | OS disk size | OS disk type |
---|---|---|---|
Standard_DS13_v2 | 5 | 256 | Ephemeral |
Standard_DS13_v2 | 5 | 256 | Premium |
Standard_L8s_v2 | 5 | 256 | NVMe |
The following sparkConfig was used for this benchmark.
sparkConfig | Value |
---|---|
spark.driver.cores | 4 |
spark.driver.memory | 16000m |
spark.driver.memoryOverhead | 2000m |
spark.executor.cores | 4 |
spark.executor.memory | 16000m |
spark.executor.memoryOverhead | 2000m |
Serializer | Value | Default |
---|---|---|
spark.serializer | org.apache.spark.serializer.KryoSerializer | Java serialization |
Additional parameters are documented in this SparkApplication yaml.
Please note that these are unaudited results and as such are not comparable with any officially published TPC-DS results.
In total, 10 iterations of the query have been executed and median execution time was recorded.
- Execution time (in seconds) of q64 with Ephemeral, Premium and NVMe disk on D and L series VMs
- Execution time(in seconds) of q82, q70 with Ephemeral vs Premium OS disk
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
-
Many thanks to @juan-lee and @alexeldeib for reviewing the AKS and NVMe setup.
-
Thanks to @alokjain-01 for looking into Spark parameters