v11, line 20: why we define a p but not used it? line 29, why we use long type?
Arsmart123 opened this issue · 4 comments
Hi! I am learning your wonderful code but get puzzled by some assembly code. Excuse me that I am a new assembly learner. I searched for some documents and still find these two points strange:
v11, line 20: why we define a p but not used it?
line 29, why we use long type?
Thank you!!!
Give me code path....pls. how-to-optimze-gemm
owns multiple backend.
Give me code path....pls.
how-to-optimze-gemm
owns multiple backend.
ok, ok. It is version 11 and it is https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/cuda/MMult_cuda_11.cu
-
https://github.com/tpoisonooo/YHs_Sample/blob/b9717dfd483e6d59c031e20dc6808686e53aba94/cuda/gemm/sgemm.cu#L64
p
is for lock guard, I do not need process edge case (AKA. m%128 < 128), so remove it. -
ld.global.xx
instruction needs long type, if using int32_t, you got a compilation error..
oh, I see. Thank you!!!!