MonroeD opened this issue 2 years ago · 0 comments
for some point, the performance of inline ptx code maybe not better than cuda c code