Answering "What is the fastest way to return a single scalar from a kernel to host?"
- CUDA Version: 11.2
- GPU: Quadro GV100
- Driver: 460.32.03
- CPU: Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz
Answering "What is the faster way to return a single scalar from a kernel to host?"
CMakeApache-2.0