StanfordLegion/legion

Thrust types not ABI-stable between nvcc and non-nvcc TUs

Closed this issue · 8 comments

Thrust 2.03 added a nested namespace to some types, which encodes the CUDA architecture list. The problem is that now types diverge between files compiled with nvcc (which defines __CUDA_ARCH_LIST__) and gcc (which doesn't).

In particular, thrust::complex<float> becomes a different type in legion_redop.cu and legion_redop.cc, causing the latter to define each SomeLegionReduction<complex<T>>::identity using a different symbol name than what the former is looking for, eventually causing a linker error:

$ objdump -t  legion_redop.cu.o | grep identity | c++filt
[...]
0000000000000000         *UND*	0000000000000000 Legion::DiffReduction<thrust::THRUST_200300_800_NS::complex<float> >::identity
[...]
$ objdump -t legion_redop.cc.o | grep identity | c++filt
[...]
0000000000000020 g     O .rodata	0000000000000008 Legion::DiffReduction<thrust::THRUST_200300___CUDA_ARCH_LIST___NS::complex<float> >::identity
[...]

I confirmed that this is expected behavior; thrust types don't have a consistent ABI between nvcc and non-nvcc TUs. The guidance is to switch to using cuda::std:: types (from CCCL or libcu++).

Due to this bug, building Legion on an Ubuntu 22.04 with CUDA 12.4 apt packages will fail.

https://gitlab.com/StanfordLegion/legion/-/merge_requests/1151 works around the build failure, by allowing the user to control which Thrust is used in the build (any version before 2.0.3 will do), Of course we still need to address the root problem.

It turns out that simply replacing thrust::complex with cuda::std::complex is sufficient to fix all issues. The issue with using that is that now Legion_REDOP_COMPLEX would imply a dependency on libcudacxx 1.4.0, which only became part of the CUDA toolkit as of 11.3.

That seems reasonable to me. Our officially documented minimum is CUDA 10, but we don't test it in CI and I don't imagine anyone is actually using it. In Sapling, our oldest CUDA is currently 11.7.

I talked with @mpokorny today. He does indeed use the built-in complex reduction operators. I asked him to check on his CUDA versions to see if using 11.3 or later would be an issue for him.

I personally am fine with switching to using cuda::std::complex as I think it will probably be better maintained going forward and it's a more natural dependence on CUDA than on something like Thrust which has kind of been a part of CUDA but not really (even though it ships with it). The people maintaining cuda::std should be much better about their support and not making backwards breaking changes.

I agree with this approach. I think complex reductions are tricky enough that we want to "do it right" on behalf of our users.

Requiring CUDA version 11.3 or later would be fine with me.

I think this is fixed now?