tpoisonooo/how-to-optimize-gemm

v11, line 20: why we define a p but not used it? line 29, why we use long type?

Arsmart123 opened this issue · 4 comments

Hi! I am learning your wonderful code but get puzzled by some assembly code. Excuse me that I am a new assembly learner. I searched for some documents and still find these two points strange:
v11, line 20: why we define a p but not used it?
line 29, why we use long type?

Thank you!!!

Give me code path....pls. how-to-optimze-gemm owns multiple backend.

Give me code path....pls. how-to-optimze-gemm owns multiple backend.

ok, ok. It is version 11 and it is https://github.com/tpoisonooo/how-to-optimize-gemm/blob/master/cuda/MMult_cuda_11.cu

  1. https://github.com/tpoisonooo/YHs_Sample/blob/b9717dfd483e6d59c031e20dc6808686e53aba94/cuda/gemm/sgemm.cu#L64 p is for lock guard, I do not need process edge case (AKA. m%128 < 128), so remove it.

  2. ld.global.xx instruction needs long type, if using int32_t, you got a compilation error..

oh, I see. Thank you!!!!