Why 2 kernels in armv8/MMult_4x4_13?
Sh-Zh-7 opened this issue · 4 comments
Sh-Zh-7 commented
Is that for performance reason or just comparison?
Sh-Zh-7 commented
Btw, in armv8/MMult1.c, although the arrays are already row-major, the multiplication is still in colum-major form, which will cause the wrong answer.
You may modify the AddDot
function like that:
void AddDot( int k, float *x, float *y, int incy, float *gamma ) {
for (int p=0; p<k; p++ ){
*gamma += x[p] * y[p * incy];
}
}
tpoisonooo commented
Is that for performance reason or just comparison?
Just comparison. Use #define KERNEL_4x4 kernel_4x4_v2
to change kernel implementation.
This is a bad code style, let me fix it later.
tpoisonooo commented
done.
Sh-Zh-7 commented
你好,我最近正在摸鱼中,无法亲自回复你的邮件。我将在摸鱼结束后,尽快给你回复。