/fftw-cufftw-benchmark

Benchmark for popular fft libaries - fftw | cufftw | cufft

Primary LanguageC++Apache License 2.0Apache-2.0

fftw-cufftw-benchmark

Benchmark for popular fft libaries - fftw | cufftw | cufft

Dependancies

Ubuntu Quickstart 18.04

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
sudo reboot

sudo apt-get install libbenchmark-dev libfftw3-dev linux-tools-common linux-tools-generic
sudo cpupower frequency-set -g performance

git clone https://github.com/hurdad/fftw-cufftw-benchmark.git
cd fftw-cufftw-benchmark
make
./cufft-single-benchmark
./cufft-single-unified-benchmark
./cufft-double-benchmark
./cufftwf-benchmark
./cufftw-benchmark
./fftw3f-benchmark
./fftw3-benchmark
./fftw3l-benchmark

Results

  • Intel i7-6700K @ 4.00GHz
  • NVIDIA GTX 1080

cufft-single-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:28:43
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
cu_fft_single/1024/manual_time         19751 ns      19777 ns      37009   395.557MB/s   49.4447M items/s
cu_fft_single/2048/manual_time         23970 ns      23998 ns      29911   651.854MB/s   81.4817M items/s
cu_fft_single/4096/manual_time         27104 ns      27140 ns      26149   1.12593GB/s   144.119M items/s
cu_fft_single/8192/manual_time         32078 ns      32109 ns      21447    1.9027GB/s   243.545M items/s
cu_fft_single/16384/manual_time        53105 ns      53155 ns      12626   2.29867GB/s   294.229M items/s
cu_fft_single/32768/manual_time        87316 ns      87370 ns       7752   2.79604GB/s   357.894M items/s
cu_fft_single/65536/manual_time       160852 ns     160905 ns       4338    3.0356GB/s   388.556M items/s
cu_fft_single/131072/manual_time      311623 ns     311700 ns       2243    3.1338GB/s   401.126M items/s
cu_fft_single/262144/manual_time      548451 ns     548535 ns       1270   3.56117GB/s   455.829M items/s
cu_fft_single/524288/manual_time     1049266 ns    1049361 ns        667   3.72284GB/s   476.524M items/s
cu_fft_single/1048576/manual_time    2112286 ns    2112392 ns        333    3.6986GB/s   473.421M items/s
cu_fft_single_BigO                      2.02 N       2.02 N 
cu_fft_single_RMS                          6 %          6 % 

cufft-single-unified-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:28:54
***WARNING*** Library was built as DEBUG. Timings may be affected.
---------------------------------------------------------------------------------
Benchmark                                          Time           CPU Iterations
---------------------------------------------------------------------------------
cu_fft_single_unified/1024/manual_time         11556 ns      11049 ns      60189   676.032MB/s    84.504M items/s
cu_fft_single_unified/2048/manual_time         14284 ns      13332 ns      51289   1093.91MB/s   136.738M items/s
cu_fft_single_unified/4096/manual_time         22913 ns      19111 ns      29399   1.33191GB/s   170.485M items/s
cu_fft_single_unified/8192/manual_time         58813 ns      41308 ns      12625   1062.69MB/s   132.836M items/s
cu_fft_single_unified/16384/manual_time       119749 ns      88636 ns       5962   1043.85MB/s   130.481M items/s
cu_fft_single_unified/32768/manual_time       640200 ns     434857 ns        936   390.503MB/s   48.8129M items/s
cu_fft_single_unified/65536/manual_time      1003725 ns     766041 ns        620   498.144MB/s    62.268M items/s
cu_fft_single_unified/131072/manual_time      591654 ns     591725 ns       1134   1.65056GB/s   211.272M items/s
cu_fft_single_unified/262144/manual_time     1007616 ns    1007723 ns        696   1.93836GB/s    248.11M items/s
cu_fft_single_unified/524288/manual_time     1768924 ns    1769052 ns        397   2.20826GB/s   282.658M items/s
cu_fft_single_unified/1048576/manual_time    3651642 ns    3651796 ns        192   2.13945GB/s   273.849M items/s
cu_fft_single_unified_BigO                      3.54 N       3.52 N 
cu_fft_single_unified_RMS                         36 %         26 % 

cufft-double-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:29:04
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
cu_fft_double/1024/manual_time         22721 ns      22748 ns      31066    343.84MB/s     42.98M items/s
cu_fft_double/2048/manual_time         34249 ns      34276 ns      20449   456.221MB/s   57.0276M items/s
cu_fft_double/4096/manual_time         55570 ns      55603 ns      11878   562.349MB/s   70.2936M items/s
cu_fft_double/8192/manual_time         60962 ns      61000 ns      10936   1025.23MB/s   128.153M items/s
cu_fft_double/16384/manual_time        95487 ns      95523 ns       7122    1.2784GB/s   163.636M items/s
cu_fft_double/32768/manual_time       176607 ns     176644 ns       3948    1.3824GB/s   176.947M items/s
cu_fft_double/65536/manual_time       344907 ns     344974 ns       2034   1.41569GB/s   181.208M items/s
cu_fft_double/131072/manual_time      608999 ns     609062 ns       1134   1.60355GB/s   205.255M items/s
cu_fft_double/262144/manual_time     1125277 ns    1125374 ns        622   1.73568GB/s   222.167M items/s
cu_fft_double/524288/manual_time     2216979 ns    2217110 ns        317   1.76197GB/s   225.532M items/s
cu_fft_double/1048576/manual_time    4504132 ns    4504475 ns        155   1.73452GB/s   222.018M items/s
cu_fft_double_BigO                      4.29 N       4.29 N 
cu_fft_double_RMS                          4 %          4 % 

cufftwf-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:29:15
***WARNING*** Library was built as DEBUG. Timings may be affected.
--------------------------------------------------------------------
Benchmark                             Time           CPU Iterations
--------------------------------------------------------------------
cu_fftwf/1024/manual_time         20114 ns      20147 ns      34794   388.403MB/s   48.5503M items/s
cu_fftwf/2048/manual_time         23009 ns      23044 ns      30765   679.082MB/s   84.8853M items/s
cu_fftwf/4096/manual_time         27026 ns      27062 ns      26078    1.1292GB/s   144.538M items/s
cu_fftwf/8192/manual_time         34616 ns      34652 ns      20004    1.7632GB/s    225.69M items/s
cu_fftwf/16384/manual_time        56608 ns      56662 ns      11779   2.15642GB/s   276.021M items/s
cu_fftwf/32768/manual_time        96624 ns      96675 ns       7197    2.5267GB/s   323.417M items/s
cu_fftwf/65536/manual_time       175800 ns     175857 ns       3990   2.77748GB/s   355.517M items/s
cu_fftwf/131072/manual_time      445886 ns     445955 ns       1908   2.19016GB/s   280.341M items/s
cu_fftwf/262144/manual_time      829577 ns     829653 ns        837   2.35436GB/s   301.358M items/s
cu_fftwf/524288/manual_time     1626205 ns    1626364 ns        430   2.40207GB/s   307.464M items/s
cu_fftwf/1048576/manual_time    2908991 ns    2909194 ns        242   2.68564GB/s   343.762M items/s
cu_fftwf_BigO                      2.86 N       2.86 N 
cu_fftwf_RMS                         10 %         10 % 

cufftw-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:29:26
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------
Benchmark                            Time           CPU Iterations
-------------------------------------------------------------------
cu_fftw/1024/manual_time         22900 ns      22936 ns      30547   682.307MB/s   42.6442M items/s
cu_fftw/2048/manual_time         35077 ns      35114 ns      19967   890.903MB/s   55.6815M items/s
cu_fftw/4096/manual_time         58327 ns      58364 ns      11334   1071.55MB/s   66.9719M items/s
cu_fftw/8192/manual_time         65203 ns      65241 ns      10220   1.87217GB/s   119.819M items/s
cu_fftw/16384/manual_time       104361 ns     104401 ns       6583   2.33939GB/s   149.721M items/s
cu_fftw/32768/manual_time       193482 ns     193526 ns       3604   2.52365GB/s   161.514M items/s
cu_fftw/65536/manual_time       481119 ns     481174 ns       1427   2.02977GB/s   129.906M items/s
cu_fftw/131072/manual_time      891661 ns     891720 ns        781   2.19043GB/s   140.188M items/s
cu_fftw/262144/manual_time     1685317 ns    1685406 ns        410   2.31781GB/s    148.34M items/s
cu_fftw/524288/manual_time     2970116 ns    2970253 ns        237   2.63037GB/s   168.344M items/s
cu_fftw/1048576/manual_time    6147474 ns    6147773 ns        113   2.54169GB/s   162.668M items/s
cu_fftw_BigO                      5.87 N       5.87 N 
cu_fftw_RMS                          6 %          6 % 

fftw3f-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:29:37
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark                          Time           CPU Iterations
-----------------------------------------------------------------
fftwf/1024/manual_time          1061 ns       1083 ns     655567   7.19347GB/s   920.764M items/s
fftwf/2048/manual_time          2534 ns       2557 ns     274154   6.02087GB/s   770.671M items/s
fftwf/4096/manual_time          7547 ns       7575 ns      92184   4.04385GB/s   517.613M items/s
fftwf/8192/manual_time         22392 ns      22422 ns      31727   2.72572GB/s   348.892M items/s
fftwf/16384/manual_time        46492 ns      46519 ns      14949   2.62562GB/s   336.079M items/s
fftwf/32768/manual_time       113423 ns     113455 ns       6185   2.15248GB/s   275.518M items/s
fftwf/65536/manual_time       250827 ns     250862 ns       2760   1.94669GB/s   249.176M items/s
fftwf/131072/manual_time      532110 ns     532157 ns       1290   1.83527GB/s   234.914M items/s
fftwf/262144/manual_time     1213591 ns    1213698 ns        572   1.60938GB/s       206M items/s
fftwf/524288/manual_time     6432471 ns    6432892 ns        107   621.845MB/s   77.7306M items/s
fftwf/1048576/manual_time   20443220 ns   20443920 ns         34   391.328MB/s    48.916M items/s
fftwf_BigO                      0.00 N^2       0.00 N^2 
fftwf_RMS                         15 %         15 % 

fftw3-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:29:47
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark                          Time           CPU Iterations
-----------------------------------------------------------------
fftw3/1024/manual_time          2253 ns       2276 ns     309133   6.77168GB/s   433.387M items/s
fftw3/2048/manual_time          5361 ns       5384 ns     127255   5.69275GB/s   364.336M items/s
fftw3/4096/manual_time         13797 ns      13821 ns      49018   4.42372GB/s   283.118M items/s
fftw3/8192/manual_time         37622 ns      37655 ns      18259   3.24462GB/s   207.656M items/s
fftw3/16384/manual_time        84457 ns      84492 ns       8428    2.8907GB/s   185.005M items/s
fftw3/32768/manual_time       186364 ns     186396 ns       3715   2.62004GB/s   167.682M items/s
fftw3/65536/manual_time       385735 ns     385771 ns       1833   2.53169GB/s   162.028M items/s
fftw3/131072/manual_time      891761 ns     891825 ns        759   2.19019GB/s   140.172M items/s
fftw3/262144/manual_time     2539636 ns    2539757 ns        267   1.53811GB/s   98.4393M items/s
fftw3/524288/manual_time    10510215 ns   10510737 ns         62   761.164MB/s   47.5728M items/s
fftw3/1048576/manual_time   38155139 ns   38156244 ns         19   419.341MB/s   26.2088M items/s
fftw3_BigO                      0.00 N^2       0.00 N^2 
fftw3_RMS                          7 %          7 % 

fftw3l-benchmark

Run on (8 X 4200 MHz CPU s)
2018-12-08 17:31:02
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------
Benchmark                          Time           CPU Iterations
-----------------------------------------------------------------
fftwl/1024/manual_time         26328 ns      26351 ns      26494   1.15914GB/s   37.0926M items/s
fftwl/2048/manual_time         57811 ns      57836 ns      11983   1081.11MB/s   33.7845M items/s
fftwl/4096/manual_time        129008 ns     129042 ns       5396   968.932MB/s   30.2791M items/s
fftwl/8192/manual_time        296533 ns     296567 ns       2373   843.076MB/s   26.3461M items/s
fftwl/16384/manual_time       624196 ns     624240 ns       1096    801.03MB/s   25.0322M items/s
fftwl/32768/manual_time      1362136 ns    1362196 ns        511   734.141MB/s   22.9419M items/s
fftwl/65536/manual_time      2899627 ns    2899727 ns        241   689.744MB/s   21.5545M items/s
fftwl/131072/manual_time     6767509 ns    6767711 ns         93   591.059MB/s   18.4706M items/s
fftwl/262144/manual_time    18643120 ns   18643650 ns         36   429.113MB/s   13.4098M items/s
fftwl/524288/manual_time    54239101 ns   54240414 ns         12    294.99MB/s   9.21844M items/s
fftwl/1048576/manual_time  123493768 ns  123496302 ns          5   259.122MB/s   8.09757M items/s
fftwl_BigO                      5.70 NlgN       5.70 NlgN 
fftwl_RMS                         19 %         19 %