performance drop observed on SHOC - DeviceMemory Local Memory related tests on Rocm stack
rpathani opened this issue · 0 comments
rpathani commented
As per AMD developer comments who debugged the issue:
the test generates kernels based on the device capabilities reported in OCL. In case of Hybrid stack(Orca) OCL runtime reports 32KB of local device memory, but ROCm stack – 64KB.
The tests uses a half of the reported amount for local array in a kernel. Thus ROCm ends up with more LDS usage, hence lower wave occupancy and lower performance. The issue should be reported to devrel for test logic replacement.