google/clspv

MADs can be obfuscated by clspv

olvaffe opened this issue · 4 comments

clpeak has benchmarks whose kernels resemble

    for(int i=0; i<64; i++)
    {
        x = y * x + y;
        y = x * y + x;
    }

clang optimizes the loop body to

x = (x + 1) * y;
y = (y + 1) * x;

which is still easy enough for vulkan drivers to work out it is two MADs.

But clspv optimizes the loop body to

t = (x + 1);
x = t * y;
y = (x + t) * y;

which is much harder for vulkan drivers to optimize.

Dumping the IR after each pass, the difference is due to the two early InstCombinePass added by clspv.

#1340 helps when the data types are scalar. But it does not help when the data types are vectors.

Passing -O0 to clspv does not stop it from adding the two early InstCombinePass too :(

Yes -O0 is not really used in clspv (#1228 (comment)).

Maybe we should consider adding something in https://github.com/google/clspv/blob/main/lib/UndoInstCombinePass.cpp?

@olvaffe interesting.. can share perf numbers in clpeak scalar tests with and without your patch? to get an idea of expexted perf speedups..

The perf numbers doubled for short and char. But I also needed to teach mesa to replace (x + 1) * y by x * y + y, for that to be identified as MAD.

It sounds like UndoInstCombinePass.cpp might be a better place to undo the combining. I can certainly look into that.