Metal programming in Julia
With Metal.jl it's possible to program GPUs on macOS using the Metal programming framework.
The package is very much a work-in-progress. There are many bugs, lots of functionality is missing; Expect to have to make changes to the code if you want to use it. PRs are very welcome!
- Mac device with M-series chip
- Julia 1.8 (this package currently requires LLVM 13)
- macOS Catalina 10.15 or newer (might also work on 10.14)
- Xcode Command Line Tools (for now, we will get rid of this requirement)
As Metal.jl currently depends on bleeding-edge versions of certain dependencies,
it is recommended to use the Manifest that is part of the repository. Start by
cloning Metal.jl, open a terminal in that directory, and starting Julia with
the --project
parameter (or activate the environment via the Pkg REPL):
$ git clone https://github.com/JuliaGPU/Metal.jl && cd Metal.jl
$ julia --project -e 'using Pkg; Pkg.build(); Pkg.instantiate()'
$ julia --project
julia> using Metal
julia> Metal.versioninfo()
macOS 12.2.0, Darwin 21.3.0
Toolchain:
- Julia: 1.8.0-beta3
- LLVM: 13.0.1
1 device:
- Apple M1 Pro (64.000 KiB allocated)
The easiest way to work with Metal.jl, is by using its array abstraction.
The MtlArray
type is both meant to be a convenient container for device
memory, as well as provide a data-parallel abstraction for using the GPU
without writing your own kernels:
julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
1
julia> a .+ 1
1-element MtlArray{Int64, 1}:
2
The above array abstractions are all implemented using Metal kernels written in Julia. These kernels follow a similar programming style to Julia's other GPU back-ends, and with that deviate from how kernels are implemented in Metal C (i.e., indexing intrinsics are functions not arguments, arbitrary aggregate arguments are supported, etc):
julia> function vadd(a, b, c)
i = thread_position_in_grid_1d()
c[i] = a[i] + b[i]
return
end
vadd (generic function with 1 method)
julia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);
julia> @metal threads=length(c) vadd(a, b, c)
julia> Array(c)
1-element Vector{Int64}:
3
Finally, all of the above functionality is made possible by interfacing with the Metal libraries through a small C library that wraps the ObjectiveC APIs. This library is built during the build phase of Metal.jl.
These low-level wrappers, along with some slightly higher-level Julia wrappers,
are available in the MTL
submodule exported by Metal.jl. All wrapped C
functions and types start with the mt
prefix, whereas the Julia wrappers are
prefixed with Mtl
:
julia> dev = MtlDevice(1)
MtlDevice:
name: Apple M1 Pro
lowpower: false
headless: false
removable: false
unified memory: true
registry id: 4294969448
transfer rate: 0
julia> dev.name
"Apple M1 Pro"
Julia wrappers to this small C library have been built with Clang.jl, and can be
regenerated by running julia --project=res/ res/wrap.jl
from the project
folder.
The C library started by forking rcp/cmt, to whom goes the original credit. This package builds upon the experience of several Julia contributors to CUDA.jl, AMDGPU.jl and oneAPI.jl.