JuliaGPU/AMDGPU.jl

error: ran out of registers during register allocation

leios opened this issue · 2 comments

leios commented

Ok, I'm going to be honest, I cannot create a MWE for this, but I can provide a replicator. I'm not expecting this issue to be fixed soon, but I wanted to make a note of it in the case other people have the same issue.

Basically, when running some pieces of code on AMDGPU, I am kicked out of the REPL and given the error:

error: ran out of registers during register allocation

My replicator is a bit messy and involves running a few different packages together:

  1. Fable (an animation engine)
  2. LolliPeople (a specific set of shapes that resembles a lollipop / person)
  3. Backgrounds (A package with specific backgrounds for certain animations)

Here is my step-by-step replicator for my 6700XT:

git clone git@github.com:leios/Fable.jl.git
git clone git@github.com:leios/Fableios.git
cd Fable.jl
git fetch origin bounds_issue
git checkout bounds_issue
cd ../Fableios
git fetch origin bounds_issue
git checkout bounds_issue
cd Backgrounds.jl
julia --project

] # To enter Pkg mode

dev ../LolliPeople.jl
dev ../../Fable.jl

Bkspace # To leave Pkg mode

using AMDGPU, Backgrounds
include("examples/crowd.jl")
crowd_example(1000, 1000; ArrayType = ROCArray)
error: ran out of registers during register allocation

Note that @vchuravy and I looked into this a while ago and found that things worked by running Julia in --bounds-check=no mode and also adding in an unsafe ceil operation:

unsafe_ceil(T, x) = Base.unsafe_trunc(T, round(x, RoundUp))

I am about to make a bunch of changes to force the @inbounds in other areas of Fable to try to get rid of this issue without the --bounds-check=no option. Again, this might have been "my fault" for missing a few @inbounds here and there, but I figured I should document the error somewhere and let people know it's vaguely correlated to bounds checking.

pxl-th commented

Do you have an idea what kernel might cause this?

leios commented

To be honest, not exactly. It's something that only appears when I have a "sufficiently complicated" scene. I'm going to be going through and adding @inbounds blocks to a bunch of functions to try to determine which one might be causing the issue in particular soon (either later today or tomorrow).

When vchuravy was looking at the generated LLVM code, it kinda blew up without the unsafe ceil, so I suspect it's in te histogram functions? If so, maybe a MWE would be a histogram with a large number of bounds-checked calls to a global array?

I've spent weeks trying to create a MWE of this issue before, but couldn't... But I just had another idea to try just now, so I can poke around a bit more.