MAT_MAT_SHARED checksums are bad
rhornung67 opened this issue · 6 comments
Checksums for all variants of MAT_MAT_SHARED kernel appear to match, but do not match checksum for Base_Seq kernel.
Hi @rhornung67 , do we know which compiler was used? I just tried using nvcc10.2.89-70-gcc8.3.1 and everything checks out for me:
===================================================================================================
Checksum Report
===================================================================================================
Kernel
........................................................
Variants Checksum Checksum Diff
(vs. first variant listed)
----------------------------------------------------------------------------------------
Basic_MAT_MAT_SHARED
........................................................
Base_Seq-default 1136.6199452543779143 0.0000000000000000000
Lambda_Seq-default 1136.6199452543779143 0.0000000000000000000
RAJA_Seq-default 1136.6199452543779143 0.0000000000000000000
Base_OpenMP-default 1136.6199452543779143 0.0000000000000000000
Lambda_OpenMP-default 1136.6199452543779143 0.0000000000000000000
RAJA_OpenMP-default 1136.6199452543779143 0.0000000000000000000
Base_CUDA-block_256 1136.6199452543779143 0.0000000000000000000
Lambda_CUDA-block_256 1136.6199452543779143 0.0000000000000000000
RAJA_CUDA-block_256 1136.6199452543779143 0.0000000000000000000
-------------------------------------------------------
I looked into this a big more with different compilers, I think I see the issue with nvcc10.2.89-70-clang10.0.1
===================================================================================================
Checksum Report
===================================================================================================
Kernel
........................................................
Variants Checksum Checksum Diff
(vs. first variant listed)
----------------------------------------------------------------------------------------
Basic_MAT_MAT_SHARED
........................................................
Base_Seq-default 1132.3939369600033145 0.0000000000000000000
Lambda_Seq-default 1136.6199452543779143 -4.2260082943745997975
RAJA_Seq-default 1136.6199452543779143 -4.2260082943745997975
Base_OpenMP-default 1136.6199452543779143 -4.2260082943745997975
Lambda_OpenMP-default 1132.3939369600033145 0.0000000000000000000
RAJA_OpenMP-default 1136.6199452543779143 -4.2260082943745997975
Base_CUDA-block_256 1136.6199452543779143 -4.2260082943745997975
Lambda_CUDA-block_256 1136.6199452543779143 -4.2260082943745997975
RAJA_CUDA-block_256 1136.6199452543779143 -4.2260082943745997975
-------------------------------------------------------
But it goes away with a newer version of clang:
nvcc10.2.89-70-clang12.0.0
===================================================================================================
Checksum Report
===================================================================================================
Kernel
........................................................
Variants Checksum Checksum Diff
(vs. first variant listed)
----------------------------------------------------------------------------------------
Basic_MAT_MAT_SHARED
........................................................
Base_Seq-default 1136.6199452543779143 0.0000000000000000000
Lambda_Seq-default 1136.6199452543779143 0.0000000000000000000
RAJA_Seq-default 1136.6199452543779143 0.0000000000000000000
Base_OpenMP-default 1136.6199452543779143 0.0000000000000000000
Lambda_OpenMP-default 1136.6199452543779143 0.0000000000000000000
RAJA_OpenMP-default 1136.6199452543779143 0.0000000000000000000
Base_CUDA-block_256 1136.6199452543779143 0.0000000000000000000
Lambda_CUDA-block_256 1136.6199452543779143 0.0000000000000000000
RAJA_CUDA-block_256 1136.6199452543779143 0.0000000000000000000
-------------------------------------------------------
I posted this issues based on the results with clang 10. I observed the same checksum issues you posted here. I don't understand why the CUDA variant results are messed up since it's the same nvcc version but a different clang version. Let's leave this issue open for a while. I'm working on some code cleanup and fixes in RAJAPerf and I may dig further if I have time.
I posted this issues based on the results with clang 10. I observed the same checksum issues you posted here. I don't understand why the CUDA variant results are messed up since it's the same nvcc version but a different clang version. Let's leave this issue open for a while. I'm working on some code cleanup and fixes in RAJAPerf and I may dig further if I have time.
I think its the host runs that are incorrect, all the other runs agree on the following checksum value:
1136.6199452543779143
That seems correct. The fact that the Base_Seq variant checksum is significantly off is disturbing.
@artv3 close this?