Does __fmaf really help?
Closed this issue · 1 comments
kwea123 commented
I see you replace basically every operation with this intrinsics, but according to this there is zero speedup, they serve only for predictable rounding.
MrNeRF commented
Yes, I've had the same experience. I tried it mainly because the NVIDIA CUDA profiler kept suggesting these operations to enhance speed. However, in terms of speed, it doesn't seem to be beneficial at all. Instead, it tends to obfuscate and makes the code more difficult to understand.