
Metal programming in Julia

Primary LanguageJuliaMIT LicenseMIT


Metal programming in Julia

With Metal.jl it's possible to program GPUs on macOS using the Metal programming framework.

The package is very much a work-in-progress. There are many bugs, lots of functionality is missing; Expect to have to make changes to the code if you want to use it. PRs are very welcome!


  • Mac device with M-series chip
  • Julia 1.8 (this package currently requires LLVM 13)
  • macOS Catalina 10.15 or newer (might also work on 10.14)
  • Xcode Command Line Tools (for now, we will get rid of this requirement)

Quick start

As Metal.jl currently depends on bleeding-edge versions of certain dependencies, it is recommended to use the Manifest that is part of the repository. Start by cloning Metal.jl, open a terminal in that directory, and starting Julia with the --project parameter (or activate the environment via the Pkg REPL):

$ git clone https://github.com/JuliaGPU/Metal.jl && cd Metal.jl

$ julia --project -e 'using Pkg; Pkg.build(); Pkg.instantiate()'

$ julia --project

julia> using Metal

julia> Metal.versioninfo()
macOS 12.2.0, Darwin 21.3.0

- Julia: 1.8.0-beta3
- LLVM: 13.0.1

1 device:
- Apple M1 Pro (64.000 KiB allocated)

Array abstraction

The easiest way to work with Metal.jl, is by using its array abstraction. The MtlArray type is both meant to be a convenient container for device memory, as well as provide a data-parallel abstraction for using the GPU without writing your own kernels:

julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:

julia> a .+ 1
1-element MtlArray{Int64, 1}:

Kernel programming

The above array abstractions are all implemented using Metal kernels written in Julia. These kernels follow a similar programming style to Julia's other GPU back-ends, and with that deviate from how kernels are implemented in Metal C (i.e., indexing intrinsics are functions not arguments, arbitrary aggregate arguments are supported, etc):

julia> function vadd(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + b[i]
vadd (generic function with 1 method)

julia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);

julia> @metal threads=length(c) vadd(a, b, c)

julia> Array(c)
1-element Vector{Int64}:

Metal API wrapper

Finally, all of the above functionality is made possible by interfacing with the Metal libraries through a small C library that wraps the ObjectiveC APIs. This library is built during the build phase of Metal.jl.

These low-level wrappers, along with some slightly higher-level Julia wrappers, are available in the MTL submodule exported by Metal.jl. All wrapped C functions and types start with the mt prefix, whereas the Julia wrappers are prefixed with Mtl:

julia> dev = MtlDevice(1)
 name:             Apple M1 Pro
 lowpower:         false
 headless:         false
 removable:        false
 unified memory:   true
 registry id:      4294969448
 transfer rate:    0

julia> dev.name
"Apple M1 Pro"

Julia wrappers to this small C library have been built with Clang.jl, and can be regenerated by running julia --project=res/ res/wrap.jl from the project folder.


The C library started by forking rcp/cmt, to whom goes the original credit. This package builds upon the experience of several Julia contributors to CUDA.jl, AMDGPU.jl and oneAPI.jl.