error: ran out of registers during register allocation
leios opened this issue · 2 comments
Ok, I'm going to be honest, I cannot create a MWE for this, but I can provide a replicator. I'm not expecting this issue to be fixed soon, but I wanted to make a note of it in the case other people have the same issue.
Basically, when running some pieces of code on AMDGPU, I am kicked out of the REPL and given the error:
error: ran out of registers during register allocation
My replicator is a bit messy and involves running a few different packages together:
- Fable (an animation engine)
- LolliPeople (a specific set of shapes that resembles a lollipop / person)
- Backgrounds (A package with specific backgrounds for certain animations)
Here is my step-by-step replicator for my 6700XT:
git clone git@github.com:leios/Fable.jl.git
git clone git@github.com:leios/Fableios.git
cd Fable.jl
git fetch origin bounds_issue
git checkout bounds_issue
cd ../Fableios
git fetch origin bounds_issue
git checkout bounds_issue
cd Backgrounds.jl
julia --project
] # To enter Pkg mode
dev ../LolliPeople.jl
dev ../../Fable.jl
Bkspace # To leave Pkg mode
using AMDGPU, Backgrounds
include("examples/crowd.jl")
crowd_example(1000, 1000; ArrayType = ROCArray)
error: ran out of registers during register allocation
Note that @vchuravy and I looked into this a while ago and found that things worked by running Julia in --bounds-check=no
mode and also adding in an unsafe ceil operation:
unsafe_ceil(T, x) = Base.unsafe_trunc(T, round(x, RoundUp))
I am about to make a bunch of changes to force the @inbounds
in other areas of Fable to try to get rid of this issue without the --bounds-check=no
option. Again, this might have been "my fault" for missing a few @inbounds
here and there, but I figured I should document the error somewhere and let people know it's vaguely correlated to bounds checking.
Do you have an idea what kernel might cause this?
To be honest, not exactly. It's something that only appears when I have a "sufficiently complicated" scene. I'm going to be going through and adding @inbounds
blocks to a bunch of functions to try to determine which one might be causing the issue in particular soon (either later today or tomorrow).
When vchuravy was looking at the generated LLVM code, it kinda blew up without the unsafe ceil, so I suspect it's in te histogram functions? If so, maybe a MWE would be a histogram with a large number of bounds-checked calls to a global array?
I've spent weeks trying to create a MWE of this issue before, but couldn't... But I just had another idea to try just now, so I can poke around a bit more.