A standardized benchmark suite for auto-tuners
BAT is a standardized benchmark suite for auto-tuners that is based on benchmarks from SHOC and contains benchmarks for CUDA programs. The benchmarks are for both whole programs and kernel-code. BAT will save all your JSON
and CSV
results to an own results directory after auto-tuning is completed. Then it will parse specified files and print out the best parameters found by the auto-tuner. The parameters and other benchmarking information will be printed out prettified in the terminal.
This benchmark suite will be useful for you if you're making your own auto-tuner and want to use the benchmarks for testing or would like to compare your auto-tuner to other known auto-tuners. BAT can also be used to check how a parameter's value changes for different architectures.
Parameters and search space for the algorithms can be seen in the src
directory here.
- Python 3 (Or Docker, see section Within a Docker container)
Without using Docker, the following steps are required to download and install the auto-tuners:
- OpenTuner
- Can be downloaded along other needed dependencies by calling
pip3 install -r requirements.txt
from the tuning_examples/opentuner directory.
- Can be downloaded along other needed dependencies by calling
- Kernel Tuner
- Can be downloaded along other needed dependencies by calling
pip3 install -r requirements.txt
from the tuning_examples/kernel_tuner directory.
- Can be downloaded along other needed dependencies by calling
- CLTune
- Need to set the environment variable
KTT_PATH=/path/to/KTT
for using the benchmarks.
- Need to set the environment variable
- KTT
- Need to set the environment variable
CLTUNE_PATH=/path/to/CLTune
for using the benchmarks.
- Need to set the environment variable
# Run all benchmark for all auto-tuners
python3 main.py
# Run the `sort` benchmark for all auto-tuners
python3 main.py -b sort
# Run all benchmarks for auto-tuner `OpenTuner`
python3 main.py -a opentuner
# Run benchmark `scan` for auto-tuner `CLTune`
python3 main.py -b scan -a cltune
Default: none
Benchmark to run. Example: sort
. If no benchmark is selected, all benchmarks are ran for selected auto-tuner(s).
Default: none
Auto-tuner to run benchmarks for. Example: ktt
. If no auto-tuner is selected, all auto-tuners are selected for benchmarking.
Default: false
If all stdout
and stderr
should be printed out during building of the benchmark(s). By default it does not print out the information during the building.
Default: 1
Problem size for the data in the benchmarks. By default it uses a problem size of 1
. This is up to the specific auto-tuner to handle.
Default: brute_force
Tuning technique to use for benchmarking. If no technique is specified, the brute force technique is selected. This is up to the specific auto-tuner to handle.
It is easy to add new auto-tuner implementations for the benchmarks, just follow these steps:
- Implement the benchmark(s) you want with your auto-tuner. If your auto-tuner tunes a whole program, the benchmarks can be found in src/programs. However if you have an auto-tuner that tunes kernels, the benchmarks can be found in src/kernels, and you have to generate the input data. Generating of input data can be done like in the KTT examples found here.
- Store your auto-tuner implementation of a benchmark inside a auto-tuner subdirectory in tuning_examples. The path to the benchmark implementation should look similar to
./tuning_examples/kernel_tuner/sort/
. - Create a
config.json
file in the same directory as the auto-tuner with content similar to this:
{
"build": [
"make clean",
"make"
],
"run": "./sort",
"results": [
"best-sort-results.json"
]
}
build
: A list of commands that will be ran before therun
command. Note, it does not work correctly with&&
between commands. This is because of a limitation in the package subprocess to run the commands in Python. A solution is therefore to split them in a list.run
: The command to run the auto-tuning benchmark.results
: A list of result files that contains the best parameters found in the auto-tuner benchmark. These will be printed out by BAT after the auto-tuning is completed.
Here are some examples of how to build the different auto-tuner Docker images:
# Build OpenTuner Dockerfile
$ docker build -t bat-opentuner -f docker/opentuner.Dockerfile .
# Build Kernel Tuner Dockerfile
$ docker build -t bat-kernel_tuner -f docker/kernel_tuner.Dockerfile .
# Build CLTune Dockerfile
$ docker build -t bat-cltune -f docker/cltune.Dockerfile .
# Build KTT Dockerfile
$ docker build -t bat-ktt -f docker/ktt.Dockerfile .
Here are some examples of how to run the different auto-tuner Docker containers:
# Run the KTT container
$ docker run -ti --gpus all bat-ktt
# Example of running container detatched
$ docker run -d -ti --gpus all bat-ktt
# Open a shell into a detatched container
$ docker exec -it <container-id> sh
# After this the commands shown in the `Running benchmarks` section can be used
# Example:
$ main.py -b sort -a ktt -t mcmc -s 4
Data from running BAT with different auto-tuners can be found here.