LLNL/RAJAPerf

MAT_MAT_SHARED checksums are bad

rhornung67 opened this issue · 6 comments

Checksums for all variants of MAT_MAT_SHARED kernel appear to match, but do not match checksum for Base_Seq kernel.

artv3 commented

Hi @rhornung67 , do we know which compiler was used? I just tried using nvcc10.2.89-70-gcc8.3.1 and everything checks out for me:

===================================================================================================                  
Checksum Report                                                                                                      
===================================================================================================                  
Kernel                                                                                                               
........................................................                                                             
Variants              Checksum                    Checksum Diff                                                      
                                                  (vs. first variant listed)                                         
----------------------------------------------------------------------------------------                             
Basic_MAT_MAT_SHARED                                                                                                 
........................................................                                                             
Base_Seq-default      1136.6199452543779143       0.0000000000000000000                                              
Lambda_Seq-default    1136.6199452543779143       0.0000000000000000000                                              
RAJA_Seq-default      1136.6199452543779143       0.0000000000000000000                                              
Base_OpenMP-default   1136.6199452543779143       0.0000000000000000000                                              
Lambda_OpenMP-default 1136.6199452543779143       0.0000000000000000000                                              
RAJA_OpenMP-default   1136.6199452543779143       0.0000000000000000000                                              
Base_CUDA-block_256   1136.6199452543779143       0.0000000000000000000                                              
Lambda_CUDA-block_256 1136.6199452543779143       0.0000000000000000000                                              
RAJA_CUDA-block_256   1136.6199452543779143       0.0000000000000000000                                              
                                                                                                                     
-------------------------------------------------------     
artv3 commented

I looked into this a big more with different compilers, I think I see the issue with nvcc10.2.89-70-clang10.0.1

===================================================================================================                  
Checksum Report                                                                                                      
===================================================================================================                  
Kernel                                                                                                               
........................................................                                                             
Variants              Checksum                    Checksum Diff                                                      
                                                  (vs. first variant listed)                                         
----------------------------------------------------------------------------------------                             
Basic_MAT_MAT_SHARED                                                                                                 
........................................................                                                             
Base_Seq-default      1132.3939369600033145       0.0000000000000000000                                              
Lambda_Seq-default    1136.6199452543779143       -4.2260082943745997975                                             
RAJA_Seq-default      1136.6199452543779143       -4.2260082943745997975                                             
Base_OpenMP-default   1136.6199452543779143       -4.2260082943745997975                                             
Lambda_OpenMP-default 1132.3939369600033145       0.0000000000000000000                                              
RAJA_OpenMP-default   1136.6199452543779143       -4.2260082943745997975                                             
Base_CUDA-block_256   1136.6199452543779143       -4.2260082943745997975                                             
Lambda_CUDA-block_256 1136.6199452543779143       -4.2260082943745997975                                             
RAJA_CUDA-block_256   1136.6199452543779143       -4.2260082943745997975                                             
                                                                                                                     
-------------------------------------------------------   

But it goes away with a newer version of clang:
nvcc10.2.89-70-clang12.0.0

===================================================================================================                  
Checksum Report                                                                                                      
===================================================================================================                  
Kernel                                                                                                               
........................................................                                                             
Variants              Checksum                    Checksum Diff                                                      
                                                  (vs. first variant listed)                                         
----------------------------------------------------------------------------------------                             
Basic_MAT_MAT_SHARED                                                                                                 
........................................................                                                             
Base_Seq-default      1136.6199452543779143       0.0000000000000000000                                              
Lambda_Seq-default    1136.6199452543779143       0.0000000000000000000                                              
RAJA_Seq-default      1136.6199452543779143       0.0000000000000000000                                              
Base_OpenMP-default   1136.6199452543779143       0.0000000000000000000                                              
Lambda_OpenMP-default 1136.6199452543779143       0.0000000000000000000                                              
RAJA_OpenMP-default   1136.6199452543779143       0.0000000000000000000                                              
Base_CUDA-block_256   1136.6199452543779143       0.0000000000000000000                                              
Lambda_CUDA-block_256 1136.6199452543779143       0.0000000000000000000                                              
RAJA_CUDA-block_256   1136.6199452543779143       0.0000000000000000000                                              
                                                                                                                     
-------------------------------------------------------    

I posted this issues based on the results with clang 10. I observed the same checksum issues you posted here. I don't understand why the CUDA variant results are messed up since it's the same nvcc version but a different clang version. Let's leave this issue open for a while. I'm working on some code cleanup and fixes in RAJAPerf and I may dig further if I have time.

artv3 commented

I posted this issues based on the results with clang 10. I observed the same checksum issues you posted here. I don't understand why the CUDA variant results are messed up since it's the same nvcc version but a different clang version. Let's leave this issue open for a while. I'm working on some code cleanup and fixes in RAJAPerf and I may dig further if I have time.

I think its the host runs that are incorrect, all the other runs agree on the following checksum value:

1136.6199452543779143

That seems correct. The fact that the Base_Seq variant checksum is significantly off is disturbing.

@artv3 close this?