JuliaGPU/Metal.jl

Compilation failure on 1.11

Closed this issue · 4 comments

As frequently seen on CI:

gpuarrays/base                                (2) |         failed at 2024-06-18T06:24:37.874
TaskFailedException
    nested task error: InterruptException:
    Stacktrace:
     [1] try_yieldto(undo::typeof(identity))
       @ Base ./task.jl:944
     [2] throwto
       @ ./task.jl:956 [inlined]
     [3] (::var"#34#43"{Dict{String, DateTime}, Task, var"#recycle_worker#42"})()
       @ Main /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-2.0/build/default-macmini-aarch64-2-0/julialang/metal-dot-jl/test/runtests.jl:284Testing finished in 6 minutes, 36 seconds, 592 milliseconds
Worker 2 failed running test gpuarrays/base:
Some tests did not pass: 91 passed, 0 failed, 2 errored, 0 broken.
gpuarrays/base: Error During Test at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-2.0/depots/5cd495a2-4a16-4674-ae02-c839447744bb/packages/GPUArrays/HjWFN/test/testsuite/base.jl:342
  Test threw exception
  Expression: compare(view, AT, a, view(i, 2:2))
  Compilation to native code failed; see below for details.
  If you think this is a bug, please file an issue and attach /private/var/tmp/agent-tempdirs/default-macmini-aarch64-2.0/tmp/jl_T2TyvmFyF8.metallib

I reduced this to:

target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx13.3.1"

declare void @llvm.trap()

define void @kernel(i32 %threads_per_grid, i32 %thread_position_in_grid) {
  %.fca.1.0.1.0.extract = load i8, i8 addrspace(1)* null, align 1
  %.unpack9.unpack = load i64, i64 addrspace(1)* null, align 8
  %.not3 = icmp eq i8 %.fca.1.0.1.0.extract, 0
  br i1 %.not3, label %L238.us.peel, label %L28

L238.us.peel:
  %.not.us.peel = icmp eq i32 %thread_position_in_grid, 0
  br i1 %.not.us.peel, label %L254, label %L30.us

L30.us:
  %.not1.us = icmp eq i32 %threads_per_grid, 0
  br i1 %.not1.us, label %L186.us, label %L42

L186.us:
  %.not7.us = icmp ult i64 0, %.unpack9.unpack
  br i1 %.not7.us, label %L30.us, label %L254

L28:
  %.not149 = icmp ult i64 %.unpack9.unpack, 2
  br i1 %.not149, label %L254, label %L42

L42:
  call void @llvm.trap()
  unreachable

L254:
  ret void
}

!air.kernel = !{!0}
!air.version = !{!7}

!0 = !{void (i32, i32)* @kernel, !1, !2}
!1 = !{}
!2 = !{!5, !6}
!5 = !{i32 2, !"air.threads_per_grid", !"air.arg_type_name", !"uint"}
!6 = !{i32 3, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint"}
!7 = !{i32 2, i32 5, i32 0}

It's not obvious what's wrong here. The generated crash reports just point to a crash in the back-end, specifically during a machine function pass (llvm::MachineFunctionPass::runOnFunction).

Original and reduced bitcode & libraries: libraries.zip

This crashes on 14.5, 15 beta, and using the offline compiler from Xcode 16 beta. I've reported it to Apple.

This is fixed on the latest macOS 15 beta. We can work around the issue by changing our trap + unreachable with a branch to the exit block. Rewrites like that are not entirely trivial, so I'm inclined to just urge Julia 1.11 users to upgrade to macOS 15...

I'm fine with that considering upgrade patterns of MacOS users, but how will we deal with CI? Will we upgrade all machines? Or just one that 1.11 always uses? I guess if we'll be supporting 1.10 for MacOS 13+ we should at least have one machine still on v13.

I opted for implementing the suggested fix.