ROCm/Thrust

copy_if doesn't work on the HIP/ROCm build path

Closed this issue · 1 comments

copy_if is listed as supported on HIP/ROCm build path, but it doesn't work. This can be seen in the stream_compaction.cu example in gdb. It fails to launch the kernel because the kernel was never compiled. The error is

_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_36commutative_reduce_intervals_closureINS_18transform_iteratorINS_6detail21predicate_to_integralI6is_oddIiElEENS7_15normal_iteratorINS_10device_ptrIiEEEElNS_11use_defaultEEENSC_INS_7pointerIlNS2_3tagESG_SG_EEEENS_4plusIlEENS0_6detail8internal21uniform_decompositionIlEENS3_20blocked_thread_arrayEEEEEvT_.

Using c++filt this gets translated to

void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::commutative_reduce_intervals_closure<thrust::transform_iterator<thrust::detail::predicate_to_integral<is_odd<int>, long>, thrust::detail::normal_iterator<thrust::device_ptr<int> >, long, thrust::use_default>, thrust::detail::normal_iterator<thrust::pointer<long, thrust::system::cuda::detail::tag, thrust::use_default, thrust::use_default> >, thrust::plus<long>, thrust::system::detail::internal::uniform_decomposition<long>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::commutative_reduce_intervals_closure<thrust::transform_iterator<thrust::detail::predicate_to_integral<is_odd<int>, long>, thrust::detail::normal_iterator<thrust::device_ptr<int> >, long, thrust::use_default>, thrust::detail::normal_iterator<thrust::pointer<long, thrust::system::cuda::detail::tag, thrust::use_default, thrust::use_default> >, thrust::plus<long>, thrust::system::detail::internal::uniform_decomposition<long>, thrust::system::cuda::detail::detail::blocked_thread_array>)

If you look into ../thrust/system/cuda/detail/reduce_intervals.inl

You will notice the following static assert:

THRUST_STATIC_ASSERT( (thrust::detail::depend_on_instantiation<InputIterator, THRUST_DEVICE_COMPILER == THRUST_DEVICE_COMPILER_NVCC>::value) );

I'm thinking that this should have never compiled in the first place.

Appears to be fixed; I added a commit changes the max_block and max_memory numbers to be more reasonable. It was slow before, but now is better.