kaldi-asr/kaldi

Missing conditional statement can cause concurrency issues in cu-kernels.cu

Lebronmydx opened this issue · 5 comments

In kernel function "_sum_reduce", when setting grid dimension to (3, 1, 1) and block dimension to (3, 1, 1), there is a race occurred at line 53:

buffer[threadIdx.x - halfPoint] += temp;

Is it better to add a conditional statement to avoid such case as below?

template<typename Real>
__device__
static Real _sum_reduce(Real buffer[]) {
  // Total number of active threads
  int32_cuda nTotalThreads = blockDim.x;
  if (blockIdx.x >= 1)
     return;
  ...
}
galv commented

Guys, we are working on fixing some race conditions that occur on newer architectures, that happen due to some changes in semantics. Justin Liutjens pointed them out, and @desh2608 is working on it. I think this is one of them.

galv commented

I'm fairly certain that this has nothing to do with the newer architectures' lack of warp synchronization (the ability to assume that all 32 threads in a warp execute at exactly the same). Mentioning @luitjens so he is aware of this.

I don't think there's a bug here. It's just that it's not documented that this code is expected to be called with only a single thread block.

Agree with @galv. The changes I'm making are only to use the cub types like BlockReduce, which would still not work with multiple blocks I suppose.

Ok, so this is not a bug, closing then.