Tutorial fails using Metal.jl
ctessum opened this issue · 3 comments
Hi,
I am trying to run this tutorial on my laptop, which has an M1 processor. My understanding is that to do this, I should just change CUDA
to Metal
:
using DiffEqGPU, DifferentialEquations, StaticArrays, Metal
function lorenz2(u, p, t)
σ = p[1]
ρ = p[2]
β = p[3]
du1 = σ * (u[2] - u[1])
du2 = u[1] * (ρ - u[3]) - u[2]
du3 = u[1] * u[2] - β * u[3]
return SVector{3}(du1, du2, du3)
end
u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz2, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(Metal.MetalBackend()),
trajectories = 10_000,
saveat = 1.0f0)
However, when I run the code above, the last line gives the error:
ERROR: InvalidIRError: compiling MethodInstance for DiffEqGPU.gpu_ode_asolve_kernel(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::MtlDeviceVector{DiffEqGPU.ImmutableODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz2), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, 1}, ::GPUTsit5, ::MtlDeviceMatrix{SVector{3, Float32}, 1}, ::MtlDeviceMatrix{Float32, 1}, ::Float32, ::CallbackSet{Tuple{}, Tuple{}}, ::Nothing, ::Float32, ::Float32, ::StepRangeLen{Float32, Float64, Float64, Int64}, ::Val{false}) resulted in invalid LLVM IR
Reason: unsupported use of double value
Reason: unsupported use of double value
Reason: unsupported use of double value
These are the package versions:
(esml_demo) pkg> status DiffEqGPU
[071ae1c0] DiffEqGPU v3.3.0
(esml_demo) pkg> status Metal
[dde4c033] Metal v0.5.1
(esml_demo) pkg> status DifferentialEquations
[0c46a032] DifferentialEquations v7.11.0
Is this the expected behavior?
More information in case relevant:
Metal.versioninfo()
macOS 14.0.0, Darwin 23.0.0
Toolchain:
- Julia: 1.9.0
- LLVM: 14.0.6
Julia packages:
- Metal.jl: 0.5.1
- Metal_LLVM_Tools_jll: 0.5.1+0
1 device:
- Apple M1 (2.406 MiB allocated)
The Apple M1 does not support Float64 values yet, which is causing some issues with type ::StepRangeLen{Float32, Float64, Float64, Int64}
(it turns out some Float64 happens with your CPU's precision). If you remove saveat=1.0f0
, it should work.
I am trying to fix it using #317. Thanks for bringing it up!
I'm getting a different error with the previous tutorial (no saveat
). Scaling down the parameters p
seems to make it go away. The size of the problem doesn't affect the error, since even trajectories=2
fails with:
Error: No solution found
│ tspan = 0.0f0
│ ts =
│ 2-element view(::Matrix{Float32}, :, 1) with eltype Float32:
│ 0.0
│ 0.0
└ @ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:175
ERROR: Batch solve failed
Code
using DiffEqGPU, OrdinaryDiffEq, StaticArrays, Metal
function lorenz(u, p, t)
σ = p[1]
ρ = p[2]
β = p[3]
du1 = σ * (u[2] - u[1])
du2 = u[1] * (ρ - u[3]) - u[2]
du3 = u[1] * u[2] - β * u[3]
return SVector{3}(du1, du2, du3)
end
u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p) # this fails
#prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p .* 0.1f0) # this works
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)
sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(Metal.MetalBackend()), trajectories = 10_000)
Complete error
1-element ExceptionStack:
LoadError: Batch solve failed
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] #126
@ ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:176 [inlined]
[3] (::DiffEqGPU.var"#126#142"{EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, GPUTsit5, Matrix{Float32}})(i::Int64)
@ DiffEqGPU ./none:0
[4] iterate
@ ./generator.jl:47 [inlined]
[5] collect(itr::Base.Generator{Base.OneTo{Int64}, DiffEqGPU.var"#126#142"{EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, GPUTsit5, Matrix{Float32}}})
@ Base ./array.jl:834
[6] batch_solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{MetalBackend}, I::UnitRange{Int64}, adaptive::Bool; kwargs::@Kwargs{unstable_check::DiffEqGPU.var"#114#120"})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:170
[7] macro expansion
@ ./timing.jl:395 [inlined]
[8] __solve(ensembleprob::EnsembleProblem{ODEProblem{SVector{3, Float32}, Tuple{Float32, Float32}, false, SVector{3, Float32}, ODEFunction{false, SciMLBase.AutoSpecialize, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing, Nothing, Nothing, Nothing}, @Kwargs{}, SciMLBase.StandardODEProblem}, var"#147#148", typeof(SciMLBase.DEFAULT_OUTPUT_FUNC), typeof(SciMLBase.DEFAULT_REDUCTION), Nothing}, alg::GPUTsit5, ensemblealg::EnsembleGPUKernel{MetalBackend}; trajectories::Int64, batch_size::Int64, unstable_check::Function, adaptive::Bool, kwargs::@Kwargs{})
@ DiffEqGPU ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:55
[9] __solve
@ ~/.julia/packages/DiffEqGPU/I999k/src/solve.jl:1 [inlined]
[10] #solve#45
@ ~/.julia/packages/DiffEqBase/52czI/src/solve.jl:1096 [inlined]
[11] top-level scope
@ ~/Documents/dev/julia-diffeqgpu/stress_test.jl:21
[12] eval
@ ./boot.jl:385 [inlined]
[13] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
@ Base ./loading.jl:2076
[14] include_string(m::Module, txt::String, fname::String)
@ Base ./loading.jl:2086
[15] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::@Kwargs{})
@ Base ./essentials.jl:892
[16] invokelatest(::Any, ::Any, ::Vararg{Any})
@ Base ./essentials.jl:889
[17] inlineeval(m::Module, code::String, code_line::Int64, code_column::Int64, file::String; softscope::Bool)
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:271
[18] (::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:181
[19] withpath(f::VSCodeServer.var"#69#74"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams}, path::String)
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/repl.jl:276
[20] (::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:179
[21] hideprompt(f::VSCodeServer.var"#68#73"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/repl.jl:38
[22] (::VSCodeServer.var"#67#72"{Bool, Bool, Bool, Module, String, Int64, Int64, String, VSCodeServer.ReplRunCodeRequestParams})()
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:150
[23] with_logstate(f::Function, logstate::Any)
@ Base.CoreLogging ./logging.jl:515
[24] with_logger
@ ./logging.jl:627 [inlined]
[25] (::VSCodeServer.var"#66#71"{VSCodeServer.ReplRunCodeRequestParams})()
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:263
[26] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[27] invokelatest(::Any)
@ Base ./essentials.jl:889
[28] (::VSCodeServer.var"#64#65")()
@ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.79.2/scripts/packages/VSCodeServer/src/eval.jl:34
in expression starting at /Users/georgegkountouras/Documents/dev/julia-diffeqgpu/stress_test.jl:21
Package versions
Status `~/Documents/dev/julia-diffeqgpu/Manifest.toml`
⌅ [47edcb42] ADTypes v0.2.7
⌅ [79e6a3ab] Adapt v3.7.2
[ec485272] ArnoldiMethod v0.4.0
⌃ [4fba245c] ArrayInterface v7.7.1
[4c555306] ArrayLayouts v1.10.0
[a9b6321e] Atomix v0.1.0
[6e4b80f9] BenchmarkTools v1.5.0
[62783981] BitTwiddlingConvenienceFunctions v0.1.5
⌅ [fa961155] CEnum v0.4.2
[2a0fbf3d] CPUSummary v0.2.5
[d360d2e6] ChainRulesCore v1.24.0
[fb6a15b2] CloseOpenIntervals v0.1.12
[38540f10] CommonSolve v0.2.4
[bbf7d656] CommonSubexpressions v0.3.0
[34da2185] Compat v4.15.0
[2569d6c7] ConcreteStructs v0.2.3
[187b0558] ConstructionBase v1.5.5
[adafc99b] CpuId v0.3.1
[9a962f9c] DataAPI v1.16.0
[864edb3b] DataStructures v0.18.20
[e2d170a0] DataValueInterfaces v1.0.0
⌃ [2b5f629d] DiffEqBase v6.147.3
[071ae1c0] DiffEqGPU v3.4.1
[163ba53b] DiffResults v1.1.0
[b552c78f] DiffRules v1.15.1
[ffbed154] DocStringExtensions v0.9.3
[4e289a0a] EnumX v1.0.4
⌃ [f151be2c] EnzymeCore v0.6.6
[d4d017d3] ExponentialUtilities v1.26.1
[e2ba6199] ExprTools v0.1.10
⌅ [7034ab61] FastBroadcast v0.2.8
[9aa1b823] FastClosures v0.3.2
[29a986be] FastLapackInterface v2.0.4
[1a297f60] FillArrays v1.11.0
[6a86dc24] FiniteDiff v2.23.1
[f6369f11] ForwardDiff v0.10.36
[069b7b12] FunctionWrappers v1.1.3
[77dc65aa] FunctionWrappersWrappers v0.1.3
⌅ [0c68f7d7] GPUArrays v9.1.0
⌅ [46192b85] GPUArraysCore v0.1.5
⌅ [61eb1bfa] GPUCompiler v0.24.5
[c145ed77] GenericSchur v0.5.4
[86223c79] Graphs v1.11.1
[3e5b6fbb] HostCPUFeatures v0.1.16
[615f187c] IfElse v0.1.1
[d25df0c9] Inflate v0.1.5
[92d709cd] IrrationalConstants v0.2.2
[82899510] IteratorInterfaceExtensions v1.0.0
[692b3bcd] JLLWrappers v1.5.0
[682c06a0] JSON v0.21.4
⌅ [ef3ab10e] KLU v0.4.1
⌃ [63c18a36] KernelAbstractions v0.9.18
[ba0b0d4f] Krylov v0.9.6
⌅ [929cbde3] LLVM v6.6.3
[10f19ff3] LayoutPointers v0.1.15
⌅ [5078a376] LazyArrays v1.10.0
[d3d80556] LineSearches v7.2.0
⌃ [7ed4a6bd] LinearSolve v2.22.1
[2ab3a3ac] LogExpFunctions v0.3.28
[bdcacae8] LoopVectorization v0.12.170
[1914dd2f] MacroTools v0.5.13
[d125e4d3] ManualMemory v0.1.8
⌅ [a3b82374] MatrixFactorizations v2.2.0
[bb5d69b7] MaybeInplace v0.1.3
⌃ [dde4c033] Metal v0.5.1
[46d2c3a1] MuladdMacro v0.2.4
[d41bc354] NLSolversBase v7.8.3
[77ba4419] NaNMath v1.0.2
⌃ [8913a72c] NonlinearSolve v3.8.3
[d8793406] ObjectFile v0.4.1
⌅ [e86c9b32] ObjectiveC v1.1.0
[6fe1bfb0] OffsetArrays v1.14.0
[bac558e1] OrderedCollections v1.6.3
⌃ [1dea7af3] OrdinaryDiffEq v6.80.1
[65ce6f38] PackageExtensionCompat v1.0.2
[d96e819e] Parameters v0.12.3
[69de0a69] Parsers v2.8.1
[f517fe37] Polyester v0.7.14
[1d0040c9] PolyesterWeave v0.2.1
[d236fae5] PreallocationTools v0.4.22
[aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.4.3
[3cdcf5f2] RecipesBase v1.3.4
⌃ [731186ca] RecursiveArrayTools v3.13.0
[f2c3362d] RecursiveFactorization v0.2.23
[189a3867] Reexport v1.2.2
[ae029012] Requires v1.3.0
[7e49a35a] RuntimeGeneratedFunctions v0.5.13
[94e857df] SIMDTypes v0.1.0
[476501e8] SLEEFPirates v0.6.42
⌃ [0bca4576] SciMLBase v2.31.0
[c0aeaf25] SciMLOperators v0.3.8
⌃ [53ae85a6] SciMLStructures v1.2.0
[6c6a2e73] Scratch v1.2.1
[efcf1570] Setfield v1.1.1
[05bca326] SimpleDiffEq v1.11.1
⌃ [727e6d20] SimpleNonlinearSolve v1.6.0
[699a6c99] SimpleTraits v0.9.4
[ce78b400] SimpleUnPack v1.1.0
⌃ [47a9eef4] SparseDiffTools v2.18.0
[e56a9233] Sparspak v0.3.9
[276daf66] SpecialFunctions v2.4.0
[aedffcd0] Static v0.8.10
[0d7ed370] StaticArrayInterface v1.5.0
[90137ffa] StaticArrays v1.9.5
[1e83bf80] StaticArraysCore v1.4.3
[7792a7ef] StrideArraysCore v0.5.6
[53d494c1] StructIO v0.3.0
⌃ [2efcf032] SymbolicIndexingInterface v0.3.11
[3783bdb8] TableTraits v1.0.1
[bd369af6] Tables v1.11.1
[8290d209] ThreadingUtilities v0.5.2
[a759f4b9] TimerOutputs v0.5.24
[d5829a12] TriangularSolve v0.2.0
[410a4b4d] Tricks v0.1.8
[781d530d] TruncatedStacktraces v1.4.0
[3a884ed6] UnPack v1.0.2
[013be700] UnsafeAtomics v0.2.1
[d80eeb9a] UnsafeAtomicsLLVM v0.1.4
[3d5dd08c] VectorizationBase v0.21.68
[19fa3120] VertexSafeGraphs v0.2.0
[700de1a5] ZygoteRules v0.2.5
[6e34b625] Bzip2_jll v1.0.8+1
[2e619515] Expat_jll v2.6.2+0
[1d5cc7b8] IntelOpenMP_jll v2024.1.0+0
⌅ [dad2f222] LLVMExtra_jll v0.0.29+0
[7106de7a] LibMPDec_jll v2.5.1+0
⌅ [e9f186c6] Libffi_jll v3.2.2+1
[856f044c] MKL_jll v2024.1.0+0
[0418c028] Metal_LLVM_Tools_jll v0.5.1+0
[458c3c95] OpenSSL_jll v3.0.14+0
[efe28fd5] OpenSpecFun_jll v0.5.5+0
[93d3a430] Python_jll v3.10.14+0
[76ed43ae] SQLite_jll v3.45.3+0
[ffd25f8a] XZ_jll v5.4.6+0
[1317d2d5] oneTBB_jll v2021.12.0+0
[0dad84c5] ArgTools v1.1.1
[56f22d72] Artifacts
[2a0f44e3] Base64
[ade2ca70] Dates
[8ba89e20] Distributed
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching
[9fa8497b] Future
[b77e0a4c] InteractiveUtils
[4af54fe1] LazyArtifacts
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.10.0
[de0858da] Printf
[9abbd945] Profile
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays v1.10.0
[10745b16] Statistics v1.10.0
[4607b0f0] SuiteSparse
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.4.0+0
[e37daf67] LibGit2_jll v1.6.4+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.2+1
[14a3606d] MozillaCACerts_jll v2023.1.10
[4536629a] OpenBLAS_jll v0.3.23+4
[05823500] OpenLibm_jll v0.8.1+2
[bea87d4a] SuiteSparse_jll v7.2.1+1
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.8.0+1
[8e850ede] nghttp2_jll v1.52.0+1
[3f19e933] p7zip_jll v17.4.0+2
Metal.versioninfo()
macOS 14.6.0, Darwin 23.6.0
Toolchain:
- Julia: 1.10.4
- LLVM: 15.0.7
Julia packages:
- Metal.jl: 0.5.1
- Metal_LLVM_Tools_jll: 0.5.1+0
1 device:
- Apple M1 Max (1.625 MiB allocated)