JuliaGPU/GPUArrays.jl

Matrix multiplication by static matrix

jonniedie opened this issue · 3 comments

It looks like this is falling to the standard generic_matmatmul!, which errors on scalar indexing

julia> using LinearAlgebra, Metal, StaticArrays

julia> Metal.GPUArrays.allowscalar(false)

julia> A = one(SMatrix{3,3}); B = MtlArray(ones(Float32, 3, 1000)); C = copy(B);

julia> C .= A * B
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
 [3] getindex
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9 [inlined]
 [4] _generic_matmatmul!(C::MtlArray{Float64, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:876
 [5] generic_matmatmul!(C::MtlArray{Float64, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:844
 [6] mul!
   @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:303 [inlined]
 [7] mul!
   @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:276 [inlined]
 [8] *(A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:141
 [9] top-level scope
   @ REPL[59]:1

It also happens if I skip straight to mul!

julia> mul!(C, A, B, true, false)
ERROR: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] assertscalar(op::String)
   @ GPUArraysCore ~/.julia/packages/GPUArraysCore/lojQM/src/GPUArraysCore.jl:87
 [3] getindex
   @ ~/.julia/packages/GPUArrays/fqD8z/src/host/indexing.jl:9 [inlined]
 [4] _generic_matmatmul!(C::MtlArray{Float32, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:876
 [5] generic_matmatmul!(C::MtlArray{Float32, 2}, tA::Char, tB::Char, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:844
 [6] mul!(C::MtlArray{Float32, 2}, A::SMatrix{3, 3, Float64, 9}, B::MtlArray{Float32, 2}, alpha::Bool, beta::Bool)
   @ LinearAlgebra /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/LinearAlgebra/src/matmul.jl:303
 [7] top-level scope
   @ REPL[60]:1

Here are the versions:

julia> versioninfo()
Julia Version 1.8.0
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 4 virtual cores
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

(jl_eeJwW1) pkg> st
Status `/private/var/folders/jk/zbhsb6bd25b9swwhb_3d6v340000gn/T/jl_eeJwW1/Project.toml`
  [dde4c033] Metal v0.1.1
  [90137ffa] StaticArrays v1.5.7
  [37e2e46d] LinearAlgebra

I'm not sure if we should support this. None of the other GPU back-ends do either, and it's not clear where to draw the line in supporting multiplication (or other operations) with non-GPU array types. Currently we only support multiplication (and other operations) between (possibly wrapped) GPU arrays.

I'd argue that StaticArrays should be given special consideration here (not just for Metal, but for all GPU backends). All of the other array packages are generally just thin wrappers that can be applied to any array and don't need any special consideration (since you can just wrap whatever array you have handy). StaticArrays are especially useful for GPU computations in that they can be instantiated on the fly without new allocations. It saves a ton of trouble from having to preallocate a container for some intermediate calculation.

Yeah, sure. I wouldn't be against a PR adding the necessary functionality, but it's not very critical so I won't myself get to work on this. Realistically, you'd also want to improve the GEMM implementation here, because the fallback is awfully slow (ideally making use of GEMMKernels.jl).