Compilation failure on 1.11
Closed this issue · 4 comments
As frequently seen on CI:
gpuarrays/base (2) | failed at 2024-06-18T06:24:37.874
TaskFailedException
nested task error: InterruptException:
Stacktrace:
[1] try_yieldto(undo::typeof(identity))
@ Base ./task.jl:944
[2] throwto
@ ./task.jl:956 [inlined]
[3] (::var"#34#43"{Dict{String, DateTime}, Task, var"#recycle_worker#42"})()
@ Main /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-2.0/build/default-macmini-aarch64-2-0/julialang/metal-dot-jl/test/runtests.jl:284Testing finished in 6 minutes, 36 seconds, 592 milliseconds
Worker 2 failed running test gpuarrays/base:
Some tests did not pass: 91 passed, 0 failed, 2 errored, 0 broken.
gpuarrays/base: Error During Test at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-2.0/depots/5cd495a2-4a16-4674-ae02-c839447744bb/packages/GPUArrays/HjWFN/test/testsuite/base.jl:342
Test threw exception
Expression: compare(view, AT, a, view(i, 2:2))
Compilation to native code failed; see below for details.
If you think this is a bug, please file an issue and attach /private/var/tmp/agent-tempdirs/default-macmini-aarch64-2.0/tmp/jl_T2TyvmFyF8.metallib
I reduced this to:
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx13.3.1"
declare void @llvm.trap()
define void @kernel(i32 %threads_per_grid, i32 %thread_position_in_grid) {
%.fca.1.0.1.0.extract = load i8, i8 addrspace(1)* null, align 1
%.unpack9.unpack = load i64, i64 addrspace(1)* null, align 8
%.not3 = icmp eq i8 %.fca.1.0.1.0.extract, 0
br i1 %.not3, label %L238.us.peel, label %L28
L238.us.peel:
%.not.us.peel = icmp eq i32 %thread_position_in_grid, 0
br i1 %.not.us.peel, label %L254, label %L30.us
L30.us:
%.not1.us = icmp eq i32 %threads_per_grid, 0
br i1 %.not1.us, label %L186.us, label %L42
L186.us:
%.not7.us = icmp ult i64 0, %.unpack9.unpack
br i1 %.not7.us, label %L30.us, label %L254
L28:
%.not149 = icmp ult i64 %.unpack9.unpack, 2
br i1 %.not149, label %L254, label %L42
L42:
call void @llvm.trap()
unreachable
L254:
ret void
}
!air.kernel = !{!0}
!air.version = !{!7}
!0 = !{void (i32, i32)* @kernel, !1, !2}
!1 = !{}
!2 = !{!5, !6}
!5 = !{i32 2, !"air.threads_per_grid", !"air.arg_type_name", !"uint"}
!6 = !{i32 3, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint"}
!7 = !{i32 2, i32 5, i32 0}
It's not obvious what's wrong here. The generated crash reports just point to a crash in the back-end, specifically during a machine function pass (llvm::MachineFunctionPass::runOnFunction
).
Original and reduced bitcode & libraries: libraries.zip
This crashes on 14.5, 15 beta, and using the offline compiler from Xcode 16 beta. I've reported it to Apple.
This is fixed on the latest macOS 15 beta. We can work around the issue by changing our trap
+ unreachable
with a branch to the exit block. Rewrites like that are not entirely trivial, so I'm inclined to just urge Julia 1.11 users to upgrade to macOS 15...
I'm fine with that considering upgrade patterns of MacOS users, but how will we deal with CI? Will we upgrade all machines? Or just one that 1.11 always uses? I guess if we'll be supporting 1.10 for MacOS 13+ we should at least have one machine still on v13.
I opted for implementing the suggested fix.