Computation error resulting from openmp offloading 3MM polybench benchmark
Opened this issue · 0 comments
rameshka commented
Hi,
I faced a computation error (Non-Matching CPU-GPU Outputs Beyond Error Threshold) when 3mm_teams application was offloaded to my Nvidia GPU. The error threshold was around 1047648. This might have resulted from data races from thread teams. I was able to fix the issue by synchronizing thread teams for each matrix multiplication.
` #pragma omp target teams distribute parallel for collapse(2)
for (int i = 0; i < NI; i++)
{
for (int j = 0; j < NJ; j++)
{
E[i*NJ + j] = 0;
for (int k = 0; k < NK; ++k)
{
E[i*NJ + j] += A[i*NK + k] * B[k*NJ + j];
}
}
}`
Each multiplication was synchronized as above. and device environment was configured prior as below.
`#pragma omp target data map(from:G[:NI*NL] ) map(to: A[:NI*NK], B[:NK*NJ],C[:NJ*NM], D[:NM*NL],E[:NI*NJ],F[:NJ*NL])`