ROCm/Tensile

Enabling UnrollLoopEfficiencyEnable leads to crash during kernel generation

NevesLucas opened this issue · 3 comments

Setting UnrollLoopEfficiencyEnable to true leads to index out of bounds error in kernelWriter.py.

Occurs when criteria is met ( prefetch local off, TT4,4/6,6/...etc) and datatype is half. (works correctly when datatype is single).

out of bounds error occurs here:
https://github.com/ROCmSoftwarePlatform/Tensile/blob/0a24e4d114f42dcc207317ded38a0dce4d438cbe/Tensile/KernelWriter.py#L2582-L2592

@nakajee could you take a look at this? Seems you were the last person to touch this code. Thanks.

Quick answer for this is UnrollLoopEfficiencyEnable does not support datatype = half.
I am planning to add some code to reject this combination.

The fix is submitted to develop branch.