NTNU-HPC-Lab/BAT

Different problems are tuned by different tuners

Closed this issue · 1 comments

One of the goals of BAT is to facilitate comparison between different tuners. However, in the current state of the code such comparisons are difficult because the codes for different tuners actually tune rather different things. To illustrate with an example, for the sort benchmark:

  • CLTune is used to tune three individual kernels from the sort benchmark
  • KTT is used to tune a pipeline of 5 kernels
  • Kernel Tuner and OpenTuner are used to tune a host code that in turn calls (repeatedly) the needed CUDA kernels, time measurements include all kernels as well as memory transfers to and from host and device memory

To be able to use BAT for meaningful comparisons of search strategies in implemented in different tuners, it is necessary to ensure that the scripts that call the different tuners all tune the same problem.

In addition, in situations where kernel performance may depend on the data, which is likely the case in at least the BFS benchmark (and possibly also in others, I haven't checked all yet), it is also important that all tuners tune the exact same problem using the same input data, which is currently not the case for BFS.

We have discussed these issues this morning in an online meeting with @odgaard and we are of course very interested in contributing to solving these issues. I'm creating this issue in the first place to have a public record of our discussion.

One way to solve this problem is to write down the exact requirements of each tuning problem in the benchmark suite, detailing which kernel, which parameters, which input data, and so on. If we have that specification we can implement the necessary scripts for as many tuners as possible. Likely this will mean that we focus on tuning exactly 1 kernel in each benchmark, since not all tuners support tuning host code and not all tuners support tuning pipelines. Tuning exactly 1 kernel in each benchmark seems to be the one common supported use case.

This issue has been addressed in collaboration with @benvanwerkhoven as part of the second version of BAT. The tuners all now map to a consistent interface that evaluates the benchmarks and tuners using the same interface and backends.