Replace deep_copy in steering vector creation subroutine with parallel_reduce

Question

Replace deep_copy in steering vector creation subroutine with parallel_reduce

Closed this issue 2 years ago · 5 comments

Answer 1 · 2022-08-25T20:39:22.000Z

deep_copy is needed here to avoid memory access errors

Answer 2 · 2022-08-25T21:00:59.000Z

If I remember correctly, the idea here was to remove the count View entirely. Instead we can return a scalar from the reduction and use it for the next loop bound

Answer 3 · 2022-08-30T16:58:17.000Z

If we did this we would need to split the count kernel by itself, then fill the steering vector, then do the cell capture. So we wouldn't actually reduce kernel launch, but may still be faster without the atomic and by making each kernel simpler

Answer 4 · 2022-09-06T15:19:27.000Z

The atomic operation in the parallel reduce is needed regardless, and the overhead of the thread-safe addition in the parallel reduce appears to be larger than the time required for the deep_copy at the end of the parallel_for

Answer 5 · 2022-09-06T15:20:06.000Z

Closing as won't do