tpoisonooo/how-to-optimize-gemm

about ldg32_nc_0

YijiaZhao opened this issue · 3 comments

https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/cuda/MMult_cuda_12.cu: 20,21
I'm a beginner of CUDA&&PTX, I want to know what does these two PTX use for?
"{.reg .pred p;\n"
"mov.b32 %0, 0;\n"
is it useless code?

For .reg .pred p; yes it is useless. The code is originally used for predicate guard, to handle conditional execution.

mov.b32 %0, 0 is used for clean reg. If you do not like it, just remove it.

thank you