Enabling UnrollLoopEfficiencyEnable leads to crash during kernel generation
NevesLucas opened this issue · 3 comments
NevesLucas commented
Setting UnrollLoopEfficiencyEnable to true leads to index out of bounds error in kernelWriter.py.
Occurs when criteria is met ( prefetch local off, TT4,4/6,6/...etc) and datatype is half. (works correctly when datatype is single).
out of bounds error occurs here:
https://github.com/ROCmSoftwarePlatform/Tensile/blob/0a24e4d114f42dcc207317ded38a0dce4d438cbe/Tensile/KernelWriter.py#L2582-L2592
benjaminulmer commented
@nakajee could you take a look at this? Seems you were the last person to touch this code. Thanks.
nakajee commented
Quick answer for this is UnrollLoopEfficiencyEnable does not support datatype = half.
I am planning to add some code to reject this combination.
nakajee commented
The fix is submitted to develop branch.