Seg Fault Error using Level Zero Compute Runtime 24.09.28717.12 and Profiler On
Opened this issue · 0 comments
jjfumero commented
Describe the bug
There is an error from the Level-JNI library when running with the profiler ON. The error is related to the JNI call to the function zeEventPoolCreate
. This error can be also reproduced using the levelzero-jni
as a standalone library.
The error stack is as follows:
Stack: [0x00007f451bc74000,0x00007f451bd74000], sp=0x00007f451bd71ac8, free space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libze_intel_gpu.so.1+0x1300e5]
C [libze_intel_gpu.so.1+0x114bc6]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j uk.ac.manchester.tornado.drivers.spirv.levelzero.LevelZeroContext.zeEventPoolCreate_native(JLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolDescriptor;IJLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolHandle;)I+0 beehive.levelzero.jni@0.1.3
j uk.ac.manchester.tornado.drivers.spirv.levelzero.LevelZeroContext.zeEventPoolCreate(JLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolDescriptor;IJLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolHandle;)I+9 beehive.levelzero.jni@0.1.3
j uk.ac.manchester.tornado.drivers.spirv.timestamps.LevelZeroKernelTimeStamp.createEventPoolAndEvents(Luk/ac/manchester/tornado/drivers/spirv/levelzero/LevelZeroContext;Luk/ac/manchester/tornado/drivers/spirv/levelzero/LevelZeroDevice;Luk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventPoolHandle;IILuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeEventHandle;)V+35 tornado.drivers.spirv@1.0.4-dev
j uk.ac.manchester.tornado.drivers.spirv.timestamps.LevelZeroKernelTimeStamp.createEventTimer()V+59 tornado.drivers.spirv@1.0.4-dev
j uk.ac.manchester.tornado.drivers.spirv.graal.SPIRVLevelZeroInstalledCode.launchKernelWithLevelZero(JLuk/ac/manchester/tornado/drivers/spirv/levelzero/ZeKernelHandle;Luk/ac/manchester/tornado/drivers/spirv/graal/SPIRVLevelZeroInstalledCode$DeviceThreadScheduling;Luk/ac/manchester/tornado/drivers/spirv/graal/SPIRVLevelZeroInstalledCode$ThreadBlockDispatcher;)V+131 tornado.drivers.spirv@1.0.4-dev
j uk.ac.manchester.tornado.drivers.spirv.graal.SPIRVLevelZeroInstalledCode.launchWithoutDependencies(JLuk/ac/manchester/tornado/runtime/common/KernelStackFrame;Luk/ac/manchester/tornado/api/memory/XPUBuffer;Luk/ac/manchester/tornado/runtime/tasks/meta/TaskMetaData;J)I+194 tornado.drivers.spirv@1.0.4-dev
j uk.ac.manchester.tornado.runtime.interpreter.TornadoVMInterpreter.executeLaunch(Ljava/lang/StringBuilder;IIIJJLuk/ac/manchester/tornado/runtime/interpreter/TornadoVMInterpreter$XPUExecutionFrame;)I+904 tornado.runtime@1.0.4-dev
This issue can only be reproduced using a recent version of the Intel Compute Runtime: https://github.com/intel/compute-runtime such as the 24.09.28717.12
for Fedora 39.
If I use a previous version (23.05.25593.18), there are no errors.
How To Reproduce
# Using the levelzero-jni repo: https://github.com/beehive-lab/levelzero-jni
./scripts/events.sh
Expected behavior
Run without errors.
Computing system setup (please complete the following information):
- OS: Fedora 39
- OpenCL and Driver versions
- If applicable, PTX and CUDA Driver versions
- If applicable, Level Zero & SPIR-V Versions:
- TornadoVM commit id: 80d9a61
- Linux Kernel:
Linux 6.8.6-200.fc39.x86_64
Additional context
n/ a.