FFT c2c : interleaved in\out vs. non-interleaved
Opened this issue · 0 comments
zviered commented
Hello,
I'm running signal processing algorithm under cortex A53.
The code is written with Intrinsic C.
I measured performance of Matrix multiply of complex matrix by scalar matrix, scalar multiply of complex float vector by complex float vector.
It seems that when in\out is interleaved (re0,im0,re1,im1...) the performance is lower compared to non-interleaved in\out.
In case of interleaved I'm using: vld2q_f32, vst2q_f32
In case of non-interleaved: vld1q_f32, vst1q_f32
Do you think it make sense to create a c2c that will get non-interleaved input ?
Thank you,
Zvika