
The performance of the conv1d_op_f32_test operator is relatively poor

Zhao-Dongyu opened this issue · 1 comments

When the kernel's height is 5, stride is 2, and the output channel is 16, I tested different input channels and input heights and found that when input_channel%4==1 or input_channel%4==2, the performance Performance is poor. Is this a problem with my usage or a problem with the operator itself?
截屏2023-12-29 上午11 05 43

I use v230