CUDA ffts can fail due to invalid alignment
mwarusz opened this issue · 2 comments
mwarusz commented
This code:
#include <YAKL.h>
#include <YAKL_fft.h>
using double1d = yakl::Array<double, 1, yakl::memDevice, yakl::styleC>;
int main()
{
yakl::init();
{
double1d b("b", 1);
int nx = 100;
double1d a("a", nx+2);
yakl::RealFFT1D<double> fft;
fft.init(a, 0, nx);
fft.forward_real(a);
}
yakl::finalize();
}
fails on CUDA
with ERROR: YAKL CUFFT: /home/mwarusz/repos/yakl_playground/externals/YAKL/src/extensions/YAKL_fft.h: 217
when using the default Gator settings. The documentation of cufftExecD2Z states that its input and output should be aligned to cufftDoubleComplex
which is 16 bytes. However, the default Gator block bytes is sizeof(size_t)
, which is insufficient to guarantee that.
Setting GATOR_BLOCK_BYTES=16
is a workaround. Additionally, this message
Line 86 in d52102d
seems wrong. The default is clearly not
128*sizeof(size_t)
.mrnorman commented
That's really impressive that you found the root cause! I'm working on this now. Thanks