OpenMP parallel for
linusmartensson opened this issue · 2 comments
I've been working with SimpleFFT for a while now in a project, and noticed some details I've modified in our local version.
Most importantly, the use of "#pragma omp parallel for" involves some overhead in setting up a multi-threaded context and passing value ranges to threads. I've never had good results with this in innermost loops, the way it's being used right now in SimpleFFT.
The alternative "#pragma omp simd" is more than likely a suitable alternative. Rather than setting up a multithreaded context, simd will use common vectorization constructs to optimize the loop, leaving the multithreading (and its overhead) to outer contexts instead.
With this change, I've had positive results enabling the flag in clang as well - noticed there was an ifdef disabling it for that environment.
Thank you for the input. Maybe some day I get to look into this more closely but I won't promise that.