mrnorman/YAKL

Intel GPU problem with JLSE

mrnorman opened this issue · 2 comments

@abagusetty , when I turn on -DYAKL_DEBUG, I'm getting new errors with SYCL on JLSE. See reproducer below:

qsub -I -t 120 -n 1 -q iris
cd YAKL/unit/build/machines/jlse
source jlse_gpu_debug.sh
make -j
make test

More specifically, for CArray, for example:

module load gdb
gdb ./CArray/CArray

(gdb) run

[New Thread 0x7ffff285a700 (LWP 10568)]
Running on Intel(R) Iris(TM) Pro Graphics P580 [0x193a]
Create Pool
INFORM: Automatically inserting fence() after every parallel_for

Thread 1 "CArray" received signal SIGSEGV, Segmentation fault.
IGC::LowerGPCallArg::processCallArg (this=<optimized out>, M=...) at /tmp/igc-1.0.8744-Release-2021.10.11/igc/IGC/Compiler/CISACodeGen/ResolveGAS.cpp:1227
1227	/tmp/igc-1.0.8744-Release-2021.10.11/igc/IGC/Compiler/CISACodeGen/ResolveGAS.cpp: No such file or directory.

(gdb) bt

#0  IGC::LowerGPCallArg::processCallArg (this=<optimized out>, M=...) at /tmp/igc-1.0.8744-Release-2021.10.11/igc/IGC/Compiler/CISACodeGen/ResolveGAS.cpp:1227
#1  0x00007fffe3503408 in IGC::LowerGPCallArg::runOnModule (this=0x20f2ec0, M=...) at /tmp/igc-1.0.8744-Release-2021.10.11/igc/IGC/Compiler/CISACodeGen/ResolveGAS.cpp:1045
#2  0x00007fffe42359c8 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /soft/compilers/intel-igc/igc-1.0.8744-Release-2021.10.11/lib64/libigc.so.1
#3  0x00007fffe33b7768 in IGC::CommonOCLBasedPasses (pContext=0x7ffffffec0a0, BuiltinGenericModule=..., BuiltinSizeModule=...) at /tmp/igc-1.0.8744-Release-2021.10.11/igc/IGC/AdaptorOCL/UnifyIROCL.cpp:539
#4  0x00007fffe33b7dff in IGC::UnifyIROCL (pContext=<optimized out>, BuiltinGenericModule=..., BuiltinSizeModule=...)
    at /soft/packaging/spack-builds/linux-opensuse_leap15-x86_64/gcc-10.2.0/gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/include/c++/10.2.0/bits/unique_ptr.h:172
#5  0x00007fffe3390205 in TC::TranslateBuild (pInputArgs=pInputArgs@entry=0x7ffffffecc40, pOutputArgs=pOutputArgs@entry=0x7ffffffecc10, inputDataFormatTemp=TC::TB_DATA_FORMAT_SPIR_V, IGCPlatform=..., 
    profilingTimerResolution=<optimized out>) at /soft/packaging/spack-builds/linux-opensuse_leap15-x86_64/gcc-10.2.0/gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/include/c++/10.2.0/bits/unique_ptr.h:172
#6  0x00007fffe3460306 in IGC::IgcOclTranslationCtx<0ul>::Impl::Translate (this=0x298e3f0, outVersion=<optimized out>, src=<optimized out>, specConstantsIds=<optimized out>, specConstantsValues=<optimized out>, 
    options=<optimized out>, internalOptions=<optimized out>, tracingOptions=<optimized out>, tracingOptionsCount=<optimized out>, gtPinInput=<optimized out>)
    at /tmp/igc-1.0.8744-Release-2021.10.11/igc/IGC/AdaptorOCL/ocl_igc_interface/impl/igc_ocl_translation_ctx_impl.h:282
#7  0x00007ffff2e4af7a in NEO::CompilerInterface::build(NEO::Device const&, NEO::TranslationInput const&, NEO::TranslationOutput&) ()
   from /soft/libraries/intel-level-zero/compute-runtime/21.40.21182-Release-2021.10.11/lib64/libze_intel_gpu.so.1
#8  0x00007ffff2dd082a in L0::ModuleTranslationUnit::buildFromSpirV(char const*, unsigned int, char const*, char const*, _ze_module_constants_t const*) ()
   from /soft/libraries/intel-level-zero/compute-runtime/21.40.21182-Release-2021.10.11/lib64/libze_intel_gpu.so.1
#9  0x00007ffff2dd1f36 in L0::ModuleImp::initialize(_ze_module_desc_t const*, NEO::Device*) () from /soft/libraries/intel-level-zero/compute-runtime/21.40.21182-Release-2021.10.11/lib64/libze_intel_gpu.so.1
#10 0x00007ffff2dd25a4 in L0::Module::create(L0::Device*, _ze_module_desc_t const*, L0::ModuleBuildLog*, L0::ModuleType) ()
   from /soft/libraries/intel-level-zero/compute-runtime/21.40.21182-Release-2021.10.11/lib64/libze_intel_gpu.so.1
#11 0x00007ffff2db6ef9 in L0::DeviceImp::createModule(_ze_module_desc_t const*, _ze_module_handle_t**, _ze_module_build_log_handle_t**, L0::ModuleType) ()
   from /soft/libraries/intel-level-zero/compute-runtime/21.40.21182-Release-2021.10.11/lib64/libze_intel_gpu.so.1
#12 0x00007ffff380c707 in compileOrBuild(_pi_program*, unsigned int, _pi_device* const*, char const*) ()
   from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libpi_level_zero.so
#13 0x00007ffff380c9eb in piProgramBuild () from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libpi_level_zero.so
#14 0x00007ffff6dada8b in cl::sycl::detail::ProgramManager::build(std::unique_ptr<_pi_program, _pi_result (*)(_pi_program*)>, std::shared_ptr<cl::sycl::detail::context_impl>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _pi_device* const&, std::map<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*>, _pi_program*, std::less<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> >, std::allocator<std::pair<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> const, _pi_program*> > >&, unsigned int) ()
   from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#15 0x00007ffff6da753e in cl::sycl::detail::ProgramManager::getBuiltPIProgram(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*, bool) ()
   from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#16 0x00007ffff6da87f3 in cl::sycl::detail::ProgramManager::getOrCreateKernel(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*) ()
   from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#17 0x00007ffff6de8b75 in cl::sycl::detail::ExecCGCommand::enqueueImp() () from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#18 0x00007ffff6df16cc in cl::sycl::detail::Scheduler::addCG(std::unique_ptr<cl::sycl::detail::CG, std::default_delete<cl::sycl::detail::CG> >, std::shared_ptr<cl::sycl::detail::queue_impl>) ()
   from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#19 0x00007ffff6e23fbc in cl::sycl::handler::finalize() () from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#20 0x00007ffff6e48db3 in cl::sycl::detail::queue_impl::submit_impl(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std::function<void (bool, bool, cl::sycl::event&)> const*) () from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#21 0x00007ffff6e480d5 in cl::sycl::queue::submit_impl(std::function<void (cl::sycl::handler&)>, cl::sycl::detail::code_location const&) ()
   from /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/lib/libsycl.so.5
#22 0x000000000052a43a in cl::sycl::queue::submit<cl::sycl::queue::parallel_for_impl<cl::sycl::detail::auto_name, yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int)::{lambda(cl::sycl::id<1>)#1}, 1>(cl::sycl::range<1>, yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int)::{lambda(cl::sycl::id<1>)#1}, cl::sycl::detail::code_location const&)::{lambda(cl::sycl::handler&)#1}> (this=0xb8c5d0, CGF=..., CodeLoc=...)
    at /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/bin/../include/sycl/CL/sycl/queue.hpp:274
#23 0x000000000052a3af in cl::sycl::queue::parallel_for_impl<cl::sycl::detail::auto_name, yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int)::{lambda(cl::sycl::id<1>)#1}, 1>(cl::sycl::range<1>, yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int)::{lambda(cl::sycl::id<1>)#1}, cl::sycl::detail::code_location const&) (this=0xb8c5d0, NumWorkItems=..., KernelFunc=..., CodeLoc=...)
    at /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/bin/../include/sycl/CL/sycl/queue.hpp:1100
#24 0x000000000052a31a in cl::sycl::queue::parallel_for<cl::sycl::detail::auto_name, yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int)::{lambda(cl::sycl::id<1>)#1}>(cl::sycl::range<1>, yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int)::{lambda(cl::sycl::id<1>)#1} const&, cl::sycl::detail::code_location const&) (this=0xb8c5d0, NumWorkItems=..., KernelFunc=..., CodeLoc=...)
    at /soft/restricted/CNDA/sdk/2021.10.30.001/oneapi/compiler/pseudo-20211026/compiler/linux/bin/../include/sycl/CL/sycl/queue.hpp:736
#25 0x000000000052a25d in yakl::c::parallel_for_sycl<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int) (bounds=..., f=..., vectorSize=128) at /home/ac.normanmr/YAKL/YAKL_parallel_for_c.h:605
#26 0x000000000052a123 in yakl::c::parallel_for<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}, 1, true>(yakl::c::Bounds<1, true> const&, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int) (bounds=..., f=..., vectorSize=128) at /home/ac.normanmr/YAKL/YAKL_parallel_for_c.h:886
#27 0x000000000052a06c in yakl::c::parallel_for<yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1}>(yakl::c::LBnd, yakl::memset<float, 1, 1, 1, float>(yakl::Array<float, 1, 1, 1>&, float)::{lambda(int)#1} const&, int) (bnd=..., f=..., vectorSize=128) at /home/ac.normanmr/YAKL/YAKL_parallel_for_c.h:920
#28 0x0000000000516680 in yakl::memset<float, 1, 1, 1, float> (arr=..., val=0) at /home/ac.normanmr/YAKL/YAKL.h:194
#29 0x00000000004cad6d in main () at /home/ac.normanmr/YAKL/unit/CArray/CArray.cpp:65

I am too looking at this test case for a possible .wait() issue and a segfault at destruction. Also the right debugger is gdb-oneapi that has the GPU support. Native gdb doesn't recognize sycl.

#45 addresses this issue