/Metal.jl

Metal programming in Julia

Primary LanguageJuliaMIT LicenseMIT

Metal.jl

Metal programming in Julia

With Metal.jl it's possible to program GPUs on macOS using the Metal programming framework.

The package is very much a work-in-progress. There are many bugs, lots of functionality is missing; Expect to have to make changes to the code if you want to use it. PRs are very welcome!

Requirements

  • Mac device with M-series chip
  • Julia 1.8 (this package currently requires LLVM 13)
  • macOS Catalina 10.15 or newer (might also work on 10.14)
  • Xcode Command Line Tools (for now, we will get rid of this requirement)

Quick start

As Metal.jl currently depends on bleeding-edge versions of certain dependencies, it is recommended to use the Manifest that is part of the repository. Start by cloning Metal.jl, open a terminal in that directory, and starting Julia with the --project parameter (or activate the environment via the Pkg REPL):

$ git clone https://github.com/JuliaGPU/Metal.jl && cd Metal.jl

$ julia --project -e 'using Pkg; Pkg.build(); Pkg.instantiate()'

$ julia --project

julia> using Metal

julia> Metal.versioninfo()
macOS 12.2.0, Darwin 21.3.0

Toolchain:
- Julia: 1.8.0-beta3
- LLVM: 13.0.1

1 device:
- Apple M1 Pro (64.000 KiB allocated)

Array abstraction

The easiest way to work with Metal.jl, is by using its array abstraction. The MtlArray type is both meant to be a convenient container for device memory, as well as provide a data-parallel abstraction for using the GPU without writing your own kernels:

julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
 1

julia> a .+ 1
1-element MtlArray{Int64, 1}:
 2

Kernel programming

The above array abstractions are all implemented using Metal kernels written in Julia. These kernels follow a similar programming style to Julia's other GPU back-ends, and with that deviate from how kernels are implemented in Metal C (i.e., indexing intrinsics are functions not arguments, arbitrary aggregate arguments are supported, etc):

julia> function vadd(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + b[i]
           return
       end
vadd (generic function with 1 method)

julia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);

julia> @metal threads=length(c) vadd(a, b, c)

julia> Array(c)
1-element Vector{Int64}:
 3

Metal API wrapper

Finally, all of the above functionality is made possible by interfacing with the Metal libraries through a small C library that wraps the ObjectiveC APIs. This library is built during the build phase of Metal.jl.

These low-level wrappers, along with some slightly higher-level Julia wrappers, are available in the MTL submodule exported by Metal.jl. All wrapped C functions and types start with the mt prefix, whereas the Julia wrappers are prefixed with Mtl:

julia> dev = MtlDevice(1)
MtlDevice:
 name:             Apple M1 Pro
 lowpower:         false
 headless:         false
 removable:        false
 unified memory:   true
 registry id:      4294969448
 transfer rate:    0

julia> dev.name
"Apple M1 Pro"

Julia wrappers to this small C library have been built with Clang.jl, and can be regenerated by running julia --project=res/ res/wrap.jl from the project folder.

Acknowledgements

The C library started by forking rcp/cmt, to whom goes the original credit. This package builds upon the experience of several Julia contributors to CUDA.jl, AMDGPU.jl and oneAPI.jl.