cuda版本非m=n=k运算出错
seth-lu opened this issue · 1 comments
seth-lu commented
如kernel_v3中:
float *begin_a = a + by * BLOCK * k; //by->n
float *begin_b = b + bx * BLOCK; //bx->m
当A,B不为方阵时会出错,例如m=k=256,n=128.
tpoisonooo commented
version 3 fixed for lost in urgly leading dimension x .
Still, I need to add more CI and unittest.