Support for multi-GPU nodes broken in 0.7
Keluaa opened this issue · 3 comments
Keluaa commented
I am on a 2 GPU node:
julia> using AMDGPU
ERROR: InitError: BoundsError: attempt to access 1-element Vector{AMDGPU.HIP.HIPDevice} at index [2]
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] AMDGPU.HIP.HIPDevice(device_id::Int64)
@ AMDGPU.HIP /briandl/.julia/packages/AMDGPU/aIM2W/src/hip/device.jl:12
[3] devices()
@ AMDGPU.HIP /briandl/.julia/packages/AMDGPU/aIM2W/src/hip/device.jl:87
[4] __init__()
@ AMDGPU /briandl/.julia/packages/AMDGPU/aIM2W/src/AMDGPU.jl:208
[5] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
@ Base ./loading.jl:1115
[6] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
@ Base ./loading.jl:1061
[7] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
@ Base ./loading.jl:1506
[8] _require(pkg::Base.PkgId, env::String)
@ Base ./loading.jl:1783
[9] _require_prelocked(uuidkey::Base.PkgId, env::String)
@ Base ./loading.jl:1660
[10] macro expansion
@ ./loading.jl:1648 [inlined]
[11] macro expansion
@ ./lock.jl:267 [inlined]
[12] require(into::Module, mod::Symbol)
@ Base ./loading.jl:1611
during initialization of module AMDGPU
I believe that this is due to the fact that ALL_DEVICES
is filled by HIP.devices()
which relies on the HIPDevice
constructor which may be missing a bounds check here.
pxl-th commented
Thanks!