mrnorman/YAKL

SYCL Streams unit test fails on current main branch

mrnorman opened this issue · 6 comments

On current main branch, hash: d29e739

qsub -I -t 30 -n 1 -q florentia_debug
source jlse_gpu_O3.sh
make -j
make test
[ac.normanmr@florentia02:~/YAKL/unit/build/machines/jlse] >:O ./Streams/Streams 
Running on Intel(R) Graphics [0x0bd5]
1
YAKL FATAL ERROR:
ERROR: val1 is wrong
terminate called after throwing an instance of 'char const*'
Aborted

Also, if -DYAKL_ENABLE_STREAMS is removed from the flags, we get a segmentation fault, and that needs to be fixed as well.

I did see this error using the default runtime and modules you were loading. Fortunately, the experimental runtime and SDK that I've used to test the multi-stream fixed this issue. Will put down the details here for tracking and can close it when the official SDK fixes it.

Thanks!

Sorry about the delay. The test works fine both with the default SDK and also the experimental SDK as shown below from the logs. I am looking into the reason why the stream test fails (i.e., segfaults when not using -DYAKL_ENABLE_STREAMS). Hope this helps.

With the latest compiler + drivers on Sunspot (the multi-stream test passes as expected)

sunspot_build_latest_module
#!/bin/bash

module purge
module use /soft/testing/modulefiles/
module load intel-UMD23.05.25593.11/23.05.25593.11
module load dpcpp-master
module load spack cmake
module list

../../cmakeclean.sh

unset GATOR_DISABLE

export CC=`which clang`
export CXX=`which clang++`
export FC=`which gfortran`
unset CXXFLAGS
unset FFLAGS

cmake -DYAKL_ARCH="SYCL" \
-DYAKL_SYCL_FLAGS="-O3 -DYAKL_ENABLE_STREAMS" \
-DCMAKE_CXX_FLAGS="-O3 -fsycl -sycl-std=2020 -fsycl-unnamed-lambda -fsycl-device-code-split=per_kernel -fsycl-targets=spir64_gen -Xsycl-target-backend \"-device 12.60.7\"" \
-DYAKL_F90_FLAGS="-O3" \
-DYAKL_C_FLAGS="-O3"   \
../../..

make -j
ctest --no-tests=error
Test log for the above build
Test project /lus/gila/projects/CSC249ADSE15_CNDA/abagusetty/yakl_stream/unit/build/machines/jlse
      Start  1: CArray_test
 1/17 Test  #1: CArray_test ......................   Passed    0.10 sec
      Start  2: FArray_test
 2/17 Test  #2: FArray_test ......................   Passed    0.08 sec
      Start  3: Gator_test
 3/17 Test  #3: Gator_test .......................   Passed    0.10 sec
      Start  4: Random_test
 4/17 Test  #4: Random_test ......................   Passed    0.07 sec
      Start  5: FFT_test
 5/17 Test  #5: FFT_test .........................   Passed    2.24 sec
      Start  6: Reductions_test
 6/17 Test  #6: Reductions_test ..................   Passed    0.10 sec
      Start  7: Atomics_test
 7/17 Test  #7: Atomics_test .....................   Passed    0.07 sec
      Start  8: Pentadiagonal_test
 8/17 Test  #8: Pentadiagonal_test ...............   Passed    0.01 sec
      Start  9: Tridiagonal_test
 9/17 Test  #9: Tridiagonal_test .................   Passed    0.01 sec
      Start 10: Lambda_test
10/17 Test #10: Lambda_test ......................   Passed    0.06 sec
      Start 11: Fortran_Link_test
11/17 Test #11: Fortran_Link_test ................Subprocess aborted***Exception:   0.29 sec
      Start 12: Fortran_Gator_test
12/17 Test #12: Fortran_Gator_test ...............   Passed    0.11 sec
      Start 13: OpenMP_Regions_test
13/17 Test #13: OpenMP_Regions_test ..............   Passed    0.06 sec
      Start 14: Intrinsics_test
14/17 Test #14: Intrinsics_test ..................   Passed    0.09 sec
      Start 15: ParForC_test
15/17 Test #15: ParForC_test .....................   Passed    0.06 sec
      Start 16: ParForFortran_test
16/17 Test #16: ParForFortran_test ...............   Passed    0.06 sec
      Start 17: Streams_test
17/17 Test #17: Streams_test .....................   Passed    2.94 sec

94% tests passed, 1 tests failed out of 17

Total Test time (real) =   6.50 sec

The following tests FAILED:
	 11 - Fortran_Link_test (Subprocess aborted)
Errors while running CTest
Output from these tests are in: /lus/gila/projects/CSC249ADSE15_CNDA/abagusetty/yakl_stream/unit/build/machines/jlse/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Using the defaults: (jlse_gpu_O3_AoT_PVC.sh) The multi-stream fails with the default SDK which is as expected.

Test log with the default SDK
abagusetty@x1921c0s2b0n0 /lus/gila/projects/CSC249ADSE15_CNDA/abagusetty/yakl_stream/unit/build/machines/jlse (sycl_stream_fortranlink) $ ./Streams/Streams 
Running on Intel(R) Graphics [0x0bd6]
3
5
Pool Memory High Water Mark:       1610612736
Pool Memory High Water Efficiency: 0.75

All the above tests

Current main still fails for me on JLSE florentia-debug node using jlse_gpu_O3.sh