Single threaded performance regression in FFT in Julia 0.5 RC3
mgr327 opened this issue · 11 comments
There seems to be a performance regression in fft as demonstrated below:
slowdown2.jl
function ch(nsteps, u)
w = complex(u)
p = plan_fft!(w)
q = plan_ifft!(w)
for n in 1:nsteps
w = p*w
w = q*w
end
w
end
srand(1)
u = rand(2^16)
ch(5, u)
@time ch(100, u)
Julia 0.4.6:
_% /usr/bin/julia slowdown2.jl
0.299310 seconds (246 allocations: 1.015 MB)
Julia 0.5 RC3:
_% /usr/local/julia/bin/julia slowdown2.jl
3.104827 seconds (271 allocations: 1.015 MB)
Here is the versioninfo:
Julia 0.4.6:
_% /usr/bin/julia -e 'versioninfo()'
Julia Version 0.4.6
Commit 2e358ce* (2016-06-19 17:16 UTC)
Platform Info:
System: Linux (x86_64-redhat-linux)
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
WORD_SIZE: 64
BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblasp.so.0
LIBM: libopenlibm
LLVM: libLLVM-3.3
Julia 0.5 RC3:
_% /usr/local/julia/bin/julia -e 'versioninfo()'
Julia Version 0.5.0-rc3+0
Commit e6f843b (2016-08-22 23:43 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, westmere)
In the example above, both Julia 0.4.6 and Julia 0.5 RC3 run single-threaded (per the result reported by
'top -H' for 'ch(10000, u)'). So the problem seems to be different from the one discussed in #17000.
#17000 discusses two different issues, but only the first one is mentioned in title of #17000:
a) enabling multi threading by default; this needs a decision
b) Lack of optimization of FFTW due to buildbot misconfiguration; just needs to be fixed, and perhaps backported to 0.4 and 0.5
Perhaps the title of this issue could changed to:
"Single threaded performance regression in FFT in Julia 0.5 RC3"?
Furthermore, could you check if you can see the same performance regression, if you compile julia and its dependencies from source, instead of using the binary?
See the comment I linked (and the few below it). Check unsafe_string(cglobal((:fftw_cc, FFTW.libfftw), UInt8)) first before recompiling stuff.
If we're talking about single-threaded performance, then this is separate from #17000. And if it's entirely due to the buildbot configuration issue preventing the optimization flags, then that should be already fixed (for all branches, no need to backport anything) but we haven't had new binaries created to verify that yet.
With regards to the questions asked earlier:
(a) Self-compiled Julia 0.5 RC3 is as fast as Julia 0.4.6:
git clone git://github.com/JuliaLang/julia.git
cd julia/
git checkout release-0.5
make
_% /scratch/julia/julia -e 'versioninfo()'
Julia Version 0.5.0-rc3+0
Commit e6f843b (2016-08-22 23:43 UTC)
Platform Info:
System: Linux (x86_64-redhat-linux)
CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, westmere)
/scratch/julia/julia slowdown2.jl
0.273984 seconds (271 allocations: 1.015 MB)
Just for the reference, downloaded "official" RC3 binaries showed a regression:
_% /usr/local/julia/bin/julia slowdown2.jl
3.083323 seconds (271 allocations: 1.015 MB)
(b) Compilation flags:
Julia 0.4.6
_% /usr/bin/julia -e '@printf "%s\n" bytestring(cglobal((:fftw_cc, FFTW.libfftw), UInt8))'
gcc -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp
-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic
Self-compiled Julia 0.5 RC3
_% /scratch/julia/julia -e '@printf "%s\n" unsafe_string(cglobal((:fftw_cc, FFTW.libfftw), UInt8))'
gcc -m64 -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math
"Official" RC3 binaries:
_% /usr/local/julia/bin/julia -e '@printf "%s\n" unsafe_string(cglobal((:fftw_cc, FFTW.libfftw), UInt8))'
gcc -march=x86-64 -m64 -I/home/centos/local/include
cc @staticfloat and xref JuliaCI/julia-buildbot#51
I've pushed a new configuration change that should clear out CFLAGS and CPPFLAGS on the builders, explicitly for the make step via the buildbot. Looking at the environment variables printed at the top of every step's logfile, the currently running build is looking good.
EDIT: Nope, did it wrong, building anew, with a nuke to ensure FFTW is rebuilt.
BAM. It's working:
$ ./julia-e6f843b073/bin/julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.5.0-rc3+0 (2016-08-22 23:43 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-unknown-linux-gnu
julia> @printf "%s\n" unsafe_string(cglobal((:fftw_cc, FFTW.libfftw), UInt8))
gcc -march=x86-64 -m64 -O3 -fomit-frame-pointer -mtune=native -malign-double -fstrict-aliasing -fno-schedule-insns -ffast-math
This particular build is available here. My personal feeling is that this isn't worth doing a new RC3 binary for, and we'll just let this roll out with RC4.
@mgr327 Thank you for your attention to detail here!
Is there anything left to do here, now that the buildbots are fixed?