NVIDIA/jitify

Clarification of implicit conversion requirements

DavidPoliakoff opened this issue · 4 comments

Hey folks,

Sorry for the bug a day pace, you just made too useful a product. This bug is the one I'm iffiest about filing, if nothing immediately pops out at you let me know and I'll try to get you a reproducer, it's 100% possible this is on our side.

I'm reading this, which mentions implicit conversions and variadic packs likely resulting in a segfault. I'm seeing such a segfault, though my stack trace goes through this version of launch, which I believe it's supposed to

I can generate one of two functions

header
__global__ void jitify_example_cu112_0(int specialization,int * d_array,int debug_do_not_merge=0){
int i = threadIdx.x + blockDim.x*blockIdx.x;
if(i<1024){
for(int k = 0; k < 8; k++){          
d_array[i] += specialization * specialization * specialization;        }    
}
}

Or I can elide specialization as a constant

header
__global__ void jitify_example_cu112_0(int * d_array,int debug_do_not_merge=0){
int i = threadIdx.x + blockDim.x*blockIdx.x;
if(i<1024){
for(int k = 0; k < 8; k++){          
d_array[i] += 512;        }    
}
}

The launch happens a little like

static jitify::KernelLauncher* launcher;
template<class... Args>
void launch_assist(Args... args){
  launcher->launch(args...);
}

From that, I successfully create a program, instantiation,...,launcher. I'm invoking the launcher with an "invoke" pattern, I'm packing my args as a tuple and then calling camp::invoke (a lot like std::apply, but more nvcc friendly). The first path, in which we have two real arguments being passed, crashes. The second succeeds. An equivalent operation piped through Cling seems to work.

Any experience with expanding variadic packs causing you grief? Again, if nothing jumps out, I'll drill down myself a bit.

Thanks!

Oh, and I doubt it matters, but I do have #11 merged

Disregard this issue, getting rid of the default argument solved it. Bizarre, that might be something you care about later, but this is an avoidable issue. To clarify, moving from

header
__global__ void jitify_example_cu112_0(int specialization,int * d_array,int debug_do_not_merge=0){
int i = threadIdx.x + blockDim.x*blockIdx.x;
if(i<1024){
for(int k = 0; k < 8; k++){          
d_array[i] += specialization * specialization * specialization;        }    
}
}

to

header
__global__ void jitify_example_cu112_0(int specialization,int * d_array){
int i = threadIdx.x + blockDim.x*blockIdx.x;
if(i<1024){
for(int k = 0; k < 8; k++){          
d_array[i] += specialization * specialization * specialization;        }    
}
}

Solved the bug

Interesting, I hadn't run into this before. Because kernels are launched via function pointers, I don't believe it's possible to support default arguments. (Unfortunately there's also no easy way to sanity-check the number of arguments).