qojulia/QuantumOptics.jl

FFT performance issue

Lightup1 opened this issue · 1 comments

Started from a performance test.

using QuantumOptics, BenchmarkTools, LinearAlgebra, MKL
using FFTW
# FFTW.set_provider!("mkl")
# FFTW.set_provider!("fftw")
FFTW.set_num_threads(6)
##
b1 = PositionBasis(-1, 1, 2^14)
b2 = MomentumBasis(b1)
##
Tpx_test = QuantumOptics.transform(b2, b1)
ppsi = Ket(b2,rand(ComplexF64,length(b2)))
psi = Ket(b1,rand(ComplexF64,length(b2)))
@benchmark QuantumOpticsBase.mul!($ppsi, $Tpx_test, $psi)
##
p1=plan_fft(rand(ComplexF64,2^14))
data1=rand(ComplexF64,2^14)
data2=rand(ComplexF64,2^14)
@benchmark mul!($data2,$p1,$data1)

1 thread:

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  81.200 μs …  1.018 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     87.300 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   92.110 μs ± 28.820 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▇█▅█▆▆▅▅▄▃▃▂▂▁▁  ▁                                         ▂
  █████████████████████▇▇█▇██▇█▇██▇▇█▇▇▇▇▇▇▇▆▅▆▆▃▆▆▅▄▄▂▅▂▃▃▃▄ █
  81.2 μs      Histogram: log(frequency) by time       159 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

6 thread:

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  57.700 μs … 769.900 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     75.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   74.873 μs ±  10.677 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                 ▁ ▂▄       ▃▆█▁
  ▁▁▂▂▁▁▁▁▂▁▁▁▁▂▇█████▆▄▄▄▆█████▆▅▄▅▅▇▆▅▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  57.7 μs         Histogram: frequency by time         96.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

The odd thing is that the pure vector fft is much faster than Ket fft.

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  23.600 μs … 800.300 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     35.600 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   36.250 μs ±   8.544 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              █▂ █▂▆▄ 
  ▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▆██▇████▇▇▄▃▃▃▄▃▅▅▄▄▃▃▂▂▂▁▂▁▁▁▁▁▁ ▂
  23.6 μs         Histogram: frequency by time         45.2 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Checked the code I think it may caused by the scaling operation.
I'll close the issue.