beehive-lab/TornadoVM

Seg Fault Error using Level Zero Compute Runtime 24.09.28717.12 and Profiler On

Opened this issue · 0 comments

Describe the bug

There is an error from the Level-JNI library when running with the profiler ON. The error is related to the JNI call to the function zeEventPoolCreate. This error can be also reproduced using the levelzero-jni as a standalone library.

The error stack is as follows:

Stack: [0x00007f451bc74000,0x00007f451bd74000],  sp=0x00007f451bd71ac8,  free space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libze_intel_gpu.so.1+0x1300e5]
C  [libze_intel_gpu.so.1+0x114bc6]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  uk.ac.manchester.tornado.drivers.spirv.levelzero.LevelZeroContext.zeEventPoolCreate_native(JLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolDescriptor;IJLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolHandle;)I+0 beehive.levelzero.jni@0.1.3
j  uk.ac.manchester.tornado.drivers.spirv.levelzero.LevelZeroContext.zeEventPoolCreate(JLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolDescriptor;IJLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolHandle;)I+9 beehive.levelzero.jni@0.1.3
j  uk.ac.manchester.tornado.drivers.spirv.timestamps.LevelZeroKernelTimeStamp.createEventPoolAndEvents(Luk/ac/manchester/tornado/drivers/spirv/levelzero/LevelZeroContext;Luk/ac/manchester/tornado/drivers/spirv/levelzero/LevelZeroDevice;Luk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolHandle;IILuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventHandle;)V+35 tornado.drivers.spirv@1.0.4-dev
j  uk.ac.manchester.tornado.drivers.spirv.timestamps.LevelZeroKernelTimeStamp.createEventTimer()V+59 tornado.drivers.spirv@1.0.4-dev
j  uk.ac.manchester.tornado.drivers.spirv.graal.SPIRVLevelZeroInstalledCode.launchKernelWithLevelZero(JLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeKernelHandle;Luk/ac/manchester/tornado/drivers/spirv/graal/SPIRVLevelZeroInstalledCode$DeviceThreadScheduling;Luk/ac/manchester/tornado/drivers/spirv/graal/SPIRVLevelZeroInstalledCode$ThreadBlockDispatcher;)V+131 tornado.drivers.spirv@1.0.4-dev
j  uk.ac.manchester.tornado.drivers.spirv.graal.SPIRVLevelZeroInstalledCode.launchWithoutDependencies(JLuk/ac/manchester/tornado/runtime/common/KernelStackFrame;Luk/ac/manchester/tornado/api/memory/XPUBuffer;Luk/ac/manchester/tornado/runtime/tasks/meta/TaskMetaData;J)I+194 tornado.drivers.spirv@1.0.4-dev
j  uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.executeLaunch(Ljava/lang/StringBuilder;IIIJJLuk/ac/manchester/tornado/runtime/interpreter/TornadoVMInterpreter$XPUExecutionFrame;)I+904 tornado.runtime@1.0.4-dev

This issue can only be reproduced using a recent version of the Intel Compute Runtime: https://github.com/intel/compute-runtime such as the 24.09.28717.12 for Fedora 39.

If I use a previous version (23.05.25593.18), there are no errors.

How To Reproduce

# Using the levelzero-jni repo: https://github.com/beehive-lab/levelzero-jni
./scripts/events.sh

Expected behavior

Run without errors.

Computing system setup (please complete the following information):

  • OS: Fedora 39
  • OpenCL and Driver versions
  • If applicable, PTX and CUDA Driver versions
  • If applicable, Level Zero & SPIR-V Versions:
  • TornadoVM commit id: 80d9a61
  • Linux Kernel: Linux 6.8.6-200.fc39.x86_64

Additional context

n/ a.