Multithreading problem caused by FFTW dependency
VPetukhov opened this issue · 3 comments
tl;dr if you use many threads, running FFTW.set_num_threads(1)
can be a good idea. Otherwise FFTW probably slows down computation and prevents outer parallelism. I suggest adding it to the README.
Full explanation
I was trying to do a lot of KDE in the loop, but it occurred that running the code in parallel slow down the process. Even if I simply set JULIA_NUM_THREADS=20
(for 56 core server) without using @threads
:
using KernelDensity
using Base.Threads
interp_kde(coords::Array{Float64, 2}, bandwidth::Float64) =
InterpKDE(kde((coords[1,:], coords[2,:]), bandwidth=(bandwidth, bandwidth)))
td = rand(2, 100000);
@time for i in 1:500
interp_kde(td, 1.0)
end
It creates multiple threads with loading 30% and takes 15.9 seconds. The same code with JULIA_NUM_THREADS=1
takes 7.5 seconds, working fairly in single thread. Timing doesn't really change if I use `@threads:
@time @threads for i in 1:500
interp_kde(td, 1.0)
end
After some digging, the problem occurred to be in the FFTW package, which is called somewhere during interpolation and by default uses nthreads() * 4
threads inside its C code. To disable it you need to run FFTW.set_num_threads(1)
. After that, running with JULIA_NUM_THREADS=20
but without @threads
takes 7.5 seconds, as it should be, and with @threads
it takes 0.5 seconds.
I was trying different run configurations, but at the end, looks like having FFTW parallel improves situation comparing to single thread only with large arrays (>500000) and large number of iterations (>100) And it's always much worse than having outer loop parallel.
Which version of Julia and the package are you using?
Julia 1.3.1, FFTW v1.2.0, KernelDensity v0.5.1
@stevengj Any thoughts on this?