omp2ocl/Unibench

Computation error resulting from openmp offloading 3MM polybench benchmark

Opened this issue · 0 comments

Hi,
I faced a computation error (Non-Matching CPU-GPU Outputs Beyond Error Threshold) when 3mm_teams application was offloaded to my Nvidia GPU. The error threshold was around 1047648. This might have resulted from data races from thread teams. I was able to fix the issue by synchronizing thread teams for each matrix multiplication.

` #pragma omp target teams distribute parallel for collapse(2)                                                                  
     for (int i = 0; i < NI; i++)
      {
        for (int j = 0; j < NJ; j++)
        {
          E[i*NJ + j] = 0;
          for (int k = 0; k < NK; ++k)
          {
            E[i*NJ + j] += A[i*NK + k] * B[k*NJ + j];
          }
        }
      }`

Each multiplication was synchronized as above. and device environment was configured prior as below.

`#pragma omp target data map(from:G[:NI*NL] ) map(to: A[:NI*NK], B[:NK*NJ],C[:NJ*NM], D[:NM*NL],E[:NI*NJ],F[:NJ*NL])`