tpoisonooo/how-to-optimize-gemm

Why 2 kernels in armv8/MMult_4x4_13?

Sh-Zh-7 opened this issue · 4 comments

Is that for performance reason or just comparison?

Btw, in armv8/MMult1.c, although the arrays are already row-major, the multiplication is still in colum-major form, which will cause the wrong answer.

You may modify the AddDot function like that:

void AddDot( int k, float *x, float *y, int incy, float *gamma ) { 
  for (int p=0; p<k; p++ ){
    *gamma += x[p] * y[p * incy];
  }
}

Is that for performance reason or just comparison?

Just comparison. Use #define KERNEL_4x4 kernel_4x4_v2 to change kernel implementation.

This is a bad code style, let me fix it later.

done.