projectNe10/Ne10

FFT c2c : interleaved in\out vs. non-interleaved

Opened this issue · 0 comments

Hello,

I'm running signal processing algorithm under cortex A53.
The code is written with Intrinsic C.

I measured performance of Matrix multiply of complex matrix by scalar matrix, scalar multiply of complex float vector by complex float vector.
It seems that when in\out is interleaved (re0,im0,re1,im1...) the performance is lower compared to non-interleaved in\out.
In case of interleaved I'm using: vld2q_f32, vst2q_f32
In case of non-interleaved: vld1q_f32, vst1q_f32

Do you think it make sense to create a c2c that will get non-interleaved input ?

Thank you,
Zvika