c3sr/comm_scope

prefetch-duplex GPU/GPU may be able to associate both streams with a single device

cwpearson opened this issue · 0 comments

If so, we would not need to measure the cost of cudaStreamSynchronize()