ROCm/Tensile

[Feature]: Further FP32 GEMM optimization for gfx11

littlewu2508 opened this issue · 2 comments

https://bugs.gentoo.org/891499#c46

Few months ago I tried to build the develop branch of rocBLAS with gfx1100 support and one users ran some benchmarks. The result shows that 7900XTX has good performance on FP16 and mixed precision GEMM, but a poor performance on FP32 GEMM. Checking the git log, it seems to indicate that there's only basic GEMM support + WMMA support, the further optimization does not come.

Library context

Software version
rocblas e44855972ff75053ba08922ca89ab288e6a9462e

@littlewu2508 Can you please test with the latest ROCm 6.1.2? If issue does not occur, please close the ticket. Thanks!

This issue is resolved after ROCm 6.0 release. Thanks!