This is a performance testing framework for Apache Spark 1.0+.
Features:
- Suites of performance tests for Spark, PySpark, Spark Streaming, and MLlib.
- Parameterized test configurations:
- Sweeps sets of parameters to test against multiple Spark and test configurations.
- Automatically downloads and builds Spark:
- Maintains a cache of successful builds to enable rapid testing against multiple Spark versions.
- [...]
For questions, bug reports, or feature requests, please open an issue on GitHub.
The spark-perf
scripts require Python 2.7+. If you're using an earlier version of Python, you may need to install the argparse
library using easy_install argparse
.
Support for automatically building Spark requires Maven. On spark-ec2
clusters, this can be installed using the ./bin/spark-ec2/install-maven
script from this project.
To configure spark-perf
, copy config/config.py.template
to config/config.py
and edit that file. See config.py.template
for detailed configuration instructions. After editing config.py
, execute ./bin/run
to run performance tests. You can pass the --config
option to use a custom configuration file.
The following sections describe some additional settings to change for certain test environments:
-
Set up local SSH server/keys such that
ssh localhost
works on your machine without a password. -
Set config.py options that are friendly for local execution:
SPARK_HOME_DIR = /path/to/your/spark SPARK_CLUSTER_URL = "spark://%s:7077" % socket.gethostname() SCALE_FACTOR = .05 SPARK_DRIVER_MEMORY = 512m spark.executor.memory = 2g
-
Uncomment at least one
SPARK_TESTS
entry
-
SSH into the machine hosting the standalone master
-
Set config.py options:
SPARK_HOME_DIR = /path/to/your/spark/install SPARK_CLUSTER_URL = "spark://<your-master-hostname>:7077" SCALE_FACTOR = <depends on your hardware> SPARK_DRIVER_MEMORY = <depends on your hardware> spark.executor.memory = <depends on your hardware>
-
Uncomment at least one
SPARK_TESTS
entry
-
Launch an EC2 cluster with Spark's EC2 scripts.
-
Set config.py options:
USE_CLUSTER_SPARK = False SPARK_COMMIT_ID = <what you want test> SCALE_FACTOR = <depends on your hardware> SPARK_DRIVER_MEMORY = <depends on your hardware> spark.executor.memory = <depends on your hardware>
-
uncomment at least one
SPARK_TESTS
entry
This testing framework started as a port + heavy modification of an earlier Spark performance testing framework written by @dennybritz.
This project is licensed under the Apache 2.0 License. See LICENSE for full license text.