Free `KernelState` in finalizer
pxl-th opened this issue · 2 comments
pxl-th commented
We probably should free KernelState
memory in HostKernel
or ROCKernel
finalizer.
Currently it is only freed in cleanup!
for ROCKernelSignal
, which means that you have to create it [and wait on it].
If you don't, then there is a memory leak.
MWE:
function empty()
return nothing
end
function main()
for i in 1:1_000_000
@roc launch=false empty()
end
end
ERROR: LoadError: HSA error (code #4104, HSA_STATUS_ERROR_OUT_OF_RESOURCES: The runtime failed to allocate the necessary resources. This error may also occur when the core runtime library needs to spawn threads or create internal OS-specific events.)
Stacktrace:
[1] check
@ ~/.julia/dev/AMDGPU/src/runtime/error.jl:34 [inlined]
[2] alloc_or_retry!(f::AMDGPU.Runtime.Mem.var"#9#10"{AMDGPU.Runtime.ROCMemoryRegion, Int64, Base.RefValue{Ptr{Nothing}}})
@ AMDGPU.Runtime.Mem ~/.julia/dev/AMDGPU/src/runtime/memory.jl:355
[3] alloc(device::ROCDevice, region::AMDGPU.Runtime.ROCMemoryRegion, bytesize::Int64)
@ AMDGPU.Runtime.Mem ~/.julia/dev/AMDGPU/src/runtime/memory.jl:379
[4] alloc(device::ROCDevice, bytesize::Int64; coherent::Bool, slow_fallback::Bool)
@ AMDGPU.Runtime.Mem ~/.julia/dev/AMDGPU/src/runtime/memory.jl:309
[5] (::AMDGPU.Compiler.var"#allocate_kernel_state#59")(device::ROCDevice)
@ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:155
[6] rocfunction(f::typeof(empty), tt::Type; name::Nothing, device::ROCDevice, global_hooks::NamedTuple{(), Tuple{}})
@ AMDGPU.Compiler ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:165
[7] rocfunction
@ ~/.julia/dev/AMDGPU/src/compiler/codegen.jl:140 [inlined]
[8] macro expansion
@ ~/.julia/dev/AMDGPU/src/highlevel.jl:433 [inlined]
[9] main()
@ Main ~/.julia/dev/gcn_test.jl:34
[10] top-level scope
@ ~/.julia/dev/gcn_test.jl:39
in expression starting at /home/pxl-th/.julia/dev/gcn_test.jl:39
jpsamaroo commented
I'm working on fixing this on jps/dev
, but one thing I'm realizing is that this is going to remove our ability to reuse kernarg buffers (because the kernel state necessarily is unique per kernel, to provide unique exception flags). There are a number of possible solutions to fix this, but we can probably just rely on the pooling kernarg allocator for now.
jpsamaroo commented
Fixed in latest push to jps/dev