Get to build and run with no errors and possibly reasonable output
Closed this issue · 2 comments
brigb123 commented
Master still has runtime CUDA errors. develop branch might solve this.
brigb123 commented
Up to 63 Registers are used in a thread at a time. A common max registers per block is 32768. 32768 / 63 is a max of 520 threads per block, which is often seen passed in multiple runs. Clamping threads/block to < max registers per block / 63 will need to be implemented.
OR the files using too many registers may need modifications to use fewer.