mrnorman/YAKL

CUDA ffts can fail due to invalid alignment

mwarusz opened this issue · 2 comments

This code:

#include <YAKL.h>
#include <YAKL_fft.h>

using double1d = yakl::Array<double, 1, yakl::memDevice, yakl::styleC>;

int main()
{
  yakl::init();
  {
    double1d b("b", 1);
    int nx = 100;
    double1d a("a", nx+2);
    yakl::RealFFT1D<double> fft;
    fft.init(a, 0, nx);
    fft.forward_real(a);
  }
  yakl::finalize();
}

fails on CUDA with ERROR: YAKL CUFFT: /home/mwarusz/repos/yakl_playground/externals/YAKL/src/extensions/YAKL_fft.h: 217 when using the default Gator settings. The documentation of cufftExecD2Z states that its input and output should be aligned to cufftDoubleComplex which is 16 bytes. However, the default Gator block bytes is sizeof(size_t), which is insufficient to guarantee that.

Setting GATOR_BLOCK_BYTES=16 is a workaround. Additionally, this message

if (yakl::yakl_mainproc()) std::cout << "WARNING: Invalid GATOR_BLOCK_BYTES. Defaulting to 128*sizeof(size_t)\n";

seems wrong. The default is clearly not 128*sizeof(size_t).

That's really impressive that you found the root cause! I'm working on this now. Thanks

This is fixed by commit 75cd1c9. Your reproducer is in unit tests as well.

Seriously, thanks again, @mwarusz , for making this so easy for me to fix.