eyalroz/cuda-kat

Account for dimensions exceeding the domain of 32-bit (unsigned) integers

eyalroz opened this issue · 0 comments

A linear CUDA grid can have 2^31-1 blocks (in the x dimension), each of size 1024 elements, for a total of a little under 2^41 threads. Currently, our types and grid_info functions assume all dimensions can fit within 32-bit integers... and that is not the case.

At the same time, it is costly to default to use 64-bit values for dimensions when a kernel author knows that the dimensions don't actually exceed 32-bits. (Limiting to 16 bits is less useful, since NVIDIA GPU cores don't operate faster on 16-bit integers).

So, we need to figure out how to support over-32-bit dimensions while not forcing them on users by default. Currently we simply do not support this.