Tutorial fails using Metal.jl

Question

Tutorial fails using Metal.jl

ctessum opened this issue a year ago · 3 comments

Hi,

I am trying to run this tutorial on my laptop, which has an M1 processor. My understanding is that to do this, I should just change CUDA to Metal:

using DiffEqGPU, DifferentialEquations, StaticArrays, Metal

function lorenz2(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz2, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(Metal.MetalBackend()),
    trajectories = 10_000,
    saveat = 1.0f0)

However, when I run the code above, the last line gives the error:

ERROR: InvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_ode_asolve_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::MtlDeviceVector{DiffEqGPU.ImmutableODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz2), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, ::GPUTsit5, ::MtlDeviceMatrix{SVector{3, Float32}, 1}, ::MtlDeviceMatrix{Float32, 1}, ::Float32, ::CallbackSet{Tuple{}, Tuple{}}, ::Nothing, ::Float32, ::Float32, ::StepRangeLen{Float32, Float64, Float64, Int64}, ::Val{false}) resulted in invalid LLVM IR
Reason: unsupported use of double value
Reason: unsupported use of double value
Reason: unsupported use of double value

These are the package versions:

(esml_demo) pkg> status DiffEqGPU
  [071ae1c0] DiffEqGPU v3.3.0
(esml_demo) pkg> status Metal
  [dde4c033] Metal v0.5.1
(esml_demo) pkg> status DifferentialEquations
  [0c46a032] DifferentialEquations v7.11.0

Is this the expected behavior?

Answer 1 · 2023-11-15T17:36:21.000Z

More information in case relevant:

Metal.versioninfo()

macOS 14.0.0, Darwin 23.0.0

Toolchain:
- Julia: 1.9.0
- LLVM: 14.0.6

Julia packages: 
- Metal.jl: 0.5.1
- Metal_LLVM_Tools_jll: 0.5.1+0

1 device:
- Apple M1 (2.406 MiB allocated)

Answer 2 · 2023-11-15T21:12:22.000Z

The Apple M1 does not support Float64 values yet, which is causing some issues with type ::StepRangeLen{Float32, Float64, Float64, Int64} (it turns out some Float64 happens with your CPU's precision). If you remove saveat=1.0f0, it should work.

I am trying to fix it using #317. Thanks for bringing it up!

Answer 3 · 2024-06-27T13:33:04.000Z

I'm getting a different error with the previous tutorial (no saveat). Scaling down the parameters p seems to make it go away. The size of the problem doesn't affect the error, since even trajectories=2 fails with:

Error: No solution found
│   tspan = 0.0f0
│   ts =
│    2-element view(::Matrix{Float32}, :, 1) with eltype Float32:
│     0.0
│     0.0
└ @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:175
ERROR: Batch solve failed

Code

using DiffEqGPU, OrdinaryDiffEq, StaticArrays, Metal

function lorenz(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p) # this fails
#prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p .* 0.1f0) # this works
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)

sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(Metal.MetalBackend()), trajectories = 10_000)

Complete error

1-element ExceptionStack:
LoadError: Batch solve failed
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] #126
    @ ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:176 [inlined]
  [3] (::DiffEqGPU.var"#126#142"{EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, GPUTsit5, Matrix{Float32}})(i::Int64)
    @ DiffEqGPU ./none:0
  [4] iterate
    @ ./generator.jl:47 [inlined]
  [5] collect(itr::Base.Generator{Base.OneTo{Int64}, DiffEqGPU.var"#126#142"{EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, GPUTsit5, Matrix{Float32}}})
    @ Base ./array.jl:834
  [6] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{MetalBackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::@Kwargs{unstable_check::DiffEqGPU.var"#114#120"})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:170
  [7] macro expansion
    @ ./timing.jl:395 [inlined]
  [8] __solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{MetalBackend}; trajectories::Int64, batch_size::Int64, unstable_check::Function, adaptive::Bool, kwargs::@Kwargs{})
    @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:55
  [9] __solve
    @ ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:1 [inlined]
 [10] #solve#45
    @ ~/.julia/packages/DiffEqBase/52czI/src/solve.jl:1096 [inlined]
 [11] top-level scope
    @ ~/Documents/dev/julia-diffeqgpu/stress_test.jl:21
 [12] eval
    @ ./boot.jl:385 [inlined]
 [13] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:2076
 [14] include_string(m::Module, txt::String, fname::String)
    @ Base ./loading.jl:2086
 [15] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
    @ Base ./essentials.jl:892
 [16] invokelatest(::Any, ::Any, ::Vararg{Any})
    @ Base ./essentials.jl:889
 [17] inlineeval(m::Module, code::String, code_line::Int64, code_column::Int64, file::String; softscope::Bool)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:271
 [18] (::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:181
 [19] withpath(f::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}, path::String)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/repl.jl:276
 [20] (::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:179
 [21] hideprompt(f::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/repl.jl:38
 [22] (::VSCodeServer.var"#67#72"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:150
 [23] with_logstate(f::Function, logstate::Any)
    @ Base.CoreLogging ./logging.jl:515
 [24] with_logger
    @ ./logging.jl:627 [inlined]
 [25] (::VSCodeServer.var"#66#71"{VSCodeServer.ReplRunCodeRequestParams})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:263
 [26] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [27] invokelatest(::Any)
    @ Base ./essentials.jl:889
 [28] (::VSCodeServer.var"#64#65")()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:34
in expression starting at /Users/georgegkountouras/Documents/dev/julia-diffeqgpu/stress_test.jl:21

Package versions

Status `~/Documents/dev/julia-diffeqgpu/Manifest.toml`
⌅ [47edcb42] ADTypes v0.2.7
⌅ [79e6a3ab] Adapt v3.7.2
  [ec485272] ArnoldiMethod v0.4.0
⌃ [4fba245c] ArrayInterface v7.7.1
  [4c555306] ArrayLayouts v1.10.0
  [a9b6321e] Atomix v0.1.0
  [6e4b80f9] BenchmarkTools v1.5.0
  [62783981] BitTwiddlingConvenienceFunctions v0.1.5
⌅ [fa961155] CEnum v0.4.2
  [2a0fbf3d] CPUSummary v0.2.5
  [d360d2e6] ChainRulesCore v1.24.0
  [fb6a15b2] CloseOpenIntervals v0.1.12
  [38540f10] CommonSolve v0.2.4
  [bbf7d656] CommonSubexpressions v0.3.0
  [34da2185] Compat v4.15.0
  [2569d6c7] ConcreteStructs v0.2.3
  [187b0558] ConstructionBase v1.5.5
  [adafc99b] CpuId v0.3.1
  [9a962f9c] DataAPI v1.16.0
  [864edb3b] DataStructures v0.18.20
  [e2d170a0] DataValueInterfaces v1.0.0
⌃ [2b5f629d] DiffEqBase v6.147.3
  [071ae1c0] DiffEqGPU v3.4.1
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
  [ffbed154] DocStringExtensions v0.9.3
  [4e289a0a] EnumX v1.0.4
⌃ [f151be2c] EnzymeCore v0.6.6
  [d4d017d3] ExponentialUtilities v1.26.1
  [e2ba6199] ExprTools v0.1.10
⌅ [7034ab61] FastBroadcast v0.2.8
  [9aa1b823] FastClosures v0.3.2
  [29a986be] FastLapackInterface v2.0.4
  [1a297f60] FillArrays v1.11.0
  [6a86dc24] FiniteDiff v2.23.1
  [f6369f11] ForwardDiff v0.10.36
  [069b7b12] FunctionWrappers v1.1.3
  [77dc65aa] FunctionWrappersWrappers v0.1.3
⌅ [0c68f7d7] GPUArrays v9.1.0
⌅ [46192b85] GPUArraysCore v0.1.5
⌅ [61eb1bfa] GPUCompiler v0.24.5
  [c145ed77] GenericSchur v0.5.4
  [86223c79] Graphs v1.11.1
  [3e5b6fbb] HostCPUFeatures v0.1.16
  [615f187c] IfElse v0.1.1
  [d25df0c9] Inflate v0.1.5
  [92d709cd] IrrationalConstants v0.2.2
  [82899510] IteratorInterfaceExtensions v1.0.0
  [692b3bcd] JLLWrappers v1.5.0
  [682c06a0] JSON v0.21.4
⌅ [ef3ab10e] KLU v0.4.1
⌃ [63c18a36] KernelAbstractions v0.9.18
  [ba0b0d4f] Krylov v0.9.6
⌅ [929cbde3] LLVM v6.6.3
  [10f19ff3] LayoutPointers v0.1.15
⌅ [5078a376] LazyArrays v1.10.0
  [d3d80556] LineSearches v7.2.0
⌃ [7ed4a6bd] LinearSolve v2.22.1
  [2ab3a3ac] LogExpFunctions v0.3.28
  [bdcacae8] LoopVectorization v0.12.170
  [1914dd2f] MacroTools v0.5.13
  [d125e4d3] ManualMemory v0.1.8
⌅ [a3b82374] MatrixFactorizations v2.2.0
  [bb5d69b7] MaybeInplace v0.1.3
⌃ [dde4c033] Metal v0.5.1
  [46d2c3a1] MuladdMacro v0.2.4
  [d41bc354] NLSolversBase v7.8.3
  [77ba4419] NaNMath v1.0.2
⌃ [8913a72c] NonlinearSolve v3.8.3
  [d8793406] ObjectFile v0.4.1
⌅ [e86c9b32] ObjectiveC v1.1.0
  [6fe1bfb0] OffsetArrays v1.14.0
  [bac558e1] OrderedCollections v1.6.3
⌃ [1dea7af3] OrdinaryDiffEq v6.80.1
  [65ce6f38] PackageExtensionCompat v1.0.2
  [d96e819e] Parameters v0.12.3
  [69de0a69] Parsers v2.8.1
  [f517fe37] Polyester v0.7.14
  [1d0040c9] PolyesterWeave v0.2.1
  [d236fae5] PreallocationTools v0.4.22
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [3cdcf5f2] RecipesBase v1.3.4
⌃ [731186ca] RecursiveArrayTools v3.13.0
  [f2c3362d] RecursiveFactorization v0.2.23
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [7e49a35a] RuntimeGeneratedFunctions v0.5.13
  [94e857df] SIMDTypes v0.1.0
  [476501e8] SLEEFPirates v0.6.42
⌃ [0bca4576] SciMLBase v2.31.0
  [c0aeaf25] SciMLOperators v0.3.8
⌃ [53ae85a6] SciMLStructures v1.2.0
  [6c6a2e73] Scratch v1.2.1
  [efcf1570] Setfield v1.1.1
  [05bca326] SimpleDiffEq v1.11.1
⌃ [727e6d20] SimpleNonlinearSolve v1.6.0
  [699a6c99] SimpleTraits v0.9.4
  [ce78b400] SimpleUnPack v1.1.0
⌃ [47a9eef4] SparseDiffTools v2.18.0
  [e56a9233] Sparspak v0.3.9
  [276daf66] SpecialFunctions v2.4.0
  [aedffcd0] Static v0.8.10
  [0d7ed370] StaticArrayInterface v1.5.0
  [90137ffa] StaticArrays v1.9.5
  [1e83bf80] StaticArraysCore v1.4.3
  [7792a7ef] StrideArraysCore v0.5.6
  [53d494c1] StructIO v0.3.0
⌃ [2efcf032] SymbolicIndexingInterface v0.3.11
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.11.1
  [8290d209] ThreadingUtilities v0.5.2
  [a759f4b9] TimerOutputs v0.5.24
  [d5829a12] TriangularSolve v0.2.0
  [410a4b4d] Tricks v0.1.8
  [781d530d] TruncatedStacktraces v1.4.0
  [3a884ed6] UnPack v1.0.2
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.1.4
  [3d5dd08c] VectorizationBase v0.21.68
  [19fa3120] VertexSafeGraphs v0.2.0
  [700de1a5] ZygoteRules v0.2.5
  [6e34b625] Bzip2_jll v1.0.8+1
  [2e619515] Expat_jll v2.6.2+0
  [1d5cc7b8] IntelOpenMP_jll v2024.1.0+0
⌅ [dad2f222] LLVMExtra_jll v0.0.29+0
  [7106de7a] LibMPDec_jll v2.5.1+0
⌅ [e9f186c6] Libffi_jll v3.2.2+1
  [856f044c] MKL_jll v2024.1.0+0
  [0418c028] Metal_LLVM_Tools_jll v0.5.1+0
  [458c3c95] OpenSSL_jll v3.0.14+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [93d3a430] Python_jll v3.10.14+0
  [76ed43ae] SQLite_jll v3.45.3+0
  [ffd25f8a] XZ_jll v5.4.6+0
  [1317d2d5] oneTBB_jll v2021.12.0+0
  [0dad84c5] ArgTools v1.1.1
  [56f22d72] Artifacts
  [2a0f44e3] Base64
  [ade2ca70] Dates
  [8ba89e20] Distributed
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching
  [9fa8497b] Future
  [b77e0a4c] InteractiveUtils
  [4af54fe1] LazyArtifacts
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2
  [8f399da3] Libdl
  [37e2e46d] LinearAlgebra
  [56ddb016] Logging
  [d6f4376e] Markdown
  [a63ad114] Mmap
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.10.0
  [de0858da] Printf
  [9abbd945] Profile
  [3fa0cd96] REPL
  [9a3f8284] Random
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization
  [1a1011a3] SharedArrays
  [6462fe0b] Sockets
  [2f01184e] SparseArrays v1.10.0
  [10745b16] Statistics v1.10.0
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test
  [cf7118a7] UUIDs
  [4ec0a83e] Unicode
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.4.0+0
  [e37daf67] LibGit2_jll v1.6.4+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.2+1
  [14a3606d] MozillaCACerts_jll v2023.1.10
  [4536629a] OpenBLAS_jll v0.3.23+4
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.2.1+1
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.8.0+1
  [8e850ede] nghttp2_jll v1.52.0+1
  [3f19e933] p7zip_jll v17.4.0+2

Metal.versioninfo()

macOS 14.6.0, Darwin 23.6.0

Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7

Julia packages: 
- Metal.jl: 0.5.1
- Metal_LLVM_Tools_jll: 0.5.1+0

1 device:
- Apple M1 Max (1.625 MiB allocated)