Implement fast CUDA mutex

Question

Implement fast CUDA mutex

Closed this issue a year ago · 0 comments

Now that numba 0.57 has been released with support for CUDA atomic compare and swap functionality (numba/numba#8790), we need to use this to give us faster CUDA reductions such as max_n, building on top of #1196. It will use a check on the version of numba available to choose whether to use the new or old mutex functionality.