Incompatibility with CUDA 11.3
stephenswat opened this issue · 6 comments
This is a quick continuation of #113, where we find that traccc
is currently not compatible with CUDA 11.3, and I would like to know why. I'll keep this as a running log of my findings.
This is the compatibility matrix of CUDA versions installed on atspot01
:
CUDA toolkit version | Works |
---|---|
10.1.243 | ❌ (expected) |
10.2.89 | ❌ (expected) |
11.0.3 | ✔️ |
11.1.1 | ✔️ |
11.2.2 | ✔️ |
11.3.1 | ❌ |
11.4.3 | ✔️ |
11.5.0 | ✔️ |
11.5.1 | ✔️ |
Okay, there is some extremely odd behaviour happening inside nvcc
, and I think inside cudafe++
. It seems that for CUDA toolkit 11.3.1, the cudafe1.cpp
translations of counting_grid_capabilities.cu
and populating_grid.cu
are incorrectly referencing spacepoint_t
:
zone_t(scalar v, const spacepoint_t< neighbor_t, 2> &nhood) const {
dindex_sequence zone(scalar v, const spacepoint_t< dindex, 2U> &nhood) const {
dindex_sequence zone(scalar v, const spacepoint_t< dindex, 2U> &nhood) const {
Here are the corresponding lines for CUDA 11.4.3:
zone_t(scalar v, const array_type< neighbor_t, 2> &nhood) const {
dindex_sequence zone(scalar v, const array_type< dindex, 2U> &nhood) const {
dindex_sequence zone(scalar v, const array_type< scalar, 2U> &nhood) const {
The corresponding lines from detray/core/include/detray/grids/axis.hpp
are:
zone_t(scalar v, const array_type<neighbor_t, 2> &nhood) const {
dindex_sequence zone(scalar v, const array_type<dindex, 2> &nhood) const {
dindex_sequence zone(scalar v, const array_type<scalar, 2> &nhood) const {
These files are generated from the corresponding .cpp4.ii
files by cudafe++
.
I can confirm that the .cpp4.ii
files have identical versions of these lines.
Invoking the two versions of cudafe++
(11.3.1 and 11.4.3) on exactly the same input (the cudafe1.stub.cpp
generated by cicc
11.3.1, and the .cpp4.ii
by the 11.3.1 preprocessor) results in the same behaviour: the 11.3.1 version erroneously inserts spacepoint_t
where it shouldn't be. Running the 11.2.2 version of cudafe++
produces the same correct output that 11.4.3 does.
Okay, I am sufficiently convinced that this is a bug in cudafe++
.
Okay, I can't really debug this any further, because cudafe++
is opaque as hell, and as far as I know there aren't really any changelogs or documentation for it. However, I have boiled down the issue to the detray::axis::regular
type. My guess is that cudafe++
can't cope with the complex kind (* → uint → *) → (*^n → *) → *
, which I suspect is either due to the n-ary nature of the kind of the second type parameter, or because the first type parameter accepts a non-*
kind.
The symptom of this is that it starts substituting (seemingly) random (incompatible) types, such as spacepoint_t
, where it expects the array type or the vector type. I presume that this might be some kind of indexing error happening at template resolution time, but I don't have enough evidence to make any concrete claims.
To conclude, CUDA 11.3.1 is completely bat-shit insane. The only next steps might be to investigate CUDA 11.3.0 and CUDA 11.4.0, the directly preceding and following versions.