to compile
cd fft_onchip
mkdir build
cd build
cmake ../ -DCMAKE_CUDA_ARCHITECTURES=86 -Dmathdx_ROOT=/home/path/to/mathdx/nvidia/mathdx/22.11 -GNinja
ninja
./gpu_fft
currently inside one can find 4 kernels:
- using cuFFTDx
include/reference.cuh
- custom using tensor cores
include/tensor_fft.cuh
for 64point fft with karatsuba algorithm based mma - custom using radix-8 DIF kernel
include/legacy8_fft.cuh
- custom using radix-16 DIF kernel
include/legacy16_fft.cuh
the output should look similar to this:
Tensor FFT took: 124.511 microseconds
Reference took: 147.854 microseconds
MSE: 2.33208e-14
to see a side by side comparison of results in addition to the MSE, change the config.hpp::print_results
to true
.