[Feature]: Further FP32 GEMM optimization for gfx11
littlewu2508 opened this issue · 2 comments
littlewu2508 commented
https://bugs.gentoo.org/891499#c46
Few months ago I tried to build the develop branch of rocBLAS with gfx1100 support and one users ran some benchmarks. The result shows that 7900XTX has good performance on FP16 and mixed precision GEMM, but a poor performance on FP32 GEMM. Checking the git log, it seems to indicate that there's only basic GEMM support + WMMA support, the further optimization does not come.
Library context
Software | version |
---|---|
rocblas | e44855972ff75053ba08922ca89ab288e6a9462e |
ppanchad-amd commented
@littlewu2508 Can you please test with the latest ROCm 6.1.2? If issue does not occur, please close the ticket. Thanks!
littlewu2508 commented
This issue is resolved after ROCm 6.0 release. Thanks!