Using a smaller block size
pclucas14 opened this issue · 0 comments
pclucas14 commented
Hi,
First of all thanks for setting up this package :) It's super helpful, thanks
I'm wondering, is there a way to use a smaller block size ? I tried modifying the python code so that no errors are thrown, however I'm hitting a
RuntimeError: CUDA error: an illegal memory access was encountered
error when calling the cuda kernel. I tried to look a bit into the kernel code, and it seems that the block_size
argument is not used. So I'm curious how the kernel knows to expect a minimal size of 32.
Any clarifications would be super helpful!
Thanks