hexops/mach

build: self-hosted compiler progress

slimsag opened this issue · 13 comments

This is the tracking issue for getting all Mach examples/libraries working with the new self-hosted Zig compiler, which will be enabled by default in Zig v0.11 and is already enabled by default in all nightly Zig builds.

Status

  • aarch64-macos: tested 2022-09-09
  • x86_64-linux: tested 2022-09-08
  • x86_64-windows: tested 2022-09-09
aarch64-macos linux windows test
cd libs/ecs && zig build test -fno-stage1
cd libs/ecs && zig build test -fstage1
cd libs/gpu && zig build test
cd libs/gpu && zig build run-example
cd libs/glfw && zig build test
cd libs/sysaudio && zig build test
cd libs/sysaudio && zig build run-example-soundio-sine-wave
cd libs/freetype && zig build test
cd libs/freetype && zig build run-example-single-glyph -- 'a'
cd libs/freetype && zig build run-example-glyph-to-svg
cd libs/gpu-dawn && zig build
zig build-exe --main-pkg-path . ./tools/html-generator.zig
🍎 🍎 🍎 zig build run-example-triangle -fstage1 -Dtarget=wasm32-freestanding-musl
zig build run-shaderexp
zig build run-example-advanced-gen-texture-light
zig build run-example-boids
⚠️ ⚠️ ⚠️ zig build run-example-ecs-app
zig build run-example-fractal-cube
zig build run-example-gkurve
zig build run-example-image-blur
zig build run-example-instanced-cube
zig build run-example-map-async
zig build run-example-rotating-cube
zig build run-example-textured-cube
zig build run-example-triangle
zig build run-example-triangle-msaa
zig build run-example-two-cubes
zig build run-example-cubemap
zig build example-triangle -Dtarget=wasm32-freestanding-musl
🍎 🍎 🍎 zig build run-example-triangle -fstage1 -Dtarget=wasm32-freestanding-musl
⚠️ ⚠️ ⚠️ zig build example-sysaudio -Dtarget=wasm32-freestanding-musl
🍎 🍎 🍎 zig build run-example-sysaudio -fstage1 -Dtarget=wasm32-freestanding-musl
⚠️ ? ⚠️ zig build
zig build test

Known issues

  • The HTTP server we use for running WebAssembly examples, apple_pie, use async, so our WASM examples cannot auto-start a web server and open the browser anymore. Not the end of the world.

Zig issues we are blocked on

Zig issues we have hacky workarounds for

Search for TODO(self-hosted)

Zig issues that are problems for us, but haven't been filed yet

NOT YET FILED:

  1. Our map-async example (which does not use zig async) required adding a useless field to the App file struct (*App is treated as *const App for some reason otherwise, not sure why. Need to make a minimal repro and file a bug. error: expected type '*main.main', found '*const main.main') a4ddfb6

I made a quick test on M1 and Intel Macs. Stage2 has made a lot of progress! Running boids is now down to a single compile error.

M1 Max

 ➜  mach git:(main) stage1 version
0.10.0-dev.2211+f32928c50
➜  mach git:(main) stage1 build run-example-boids
mach: found Metal backend on Discrete GPU adapter: Apple M1 Max, Metal driver on macOS Version 12.3.1 (Build 21E258)
Frame 60
Frame 120
Frame 180
…
➜  mach git:(main) stage2 build run-example-boids
/Users/jonas/src/zig/mach/build.zig:134:77: error: cannot @bitCast to 'gpu.libs.mach-glfw.build.Options', struct does not have a guaranteed in-memory layout
            .glfw_options = @bitCast(@import("gpu/libs/mach-glfw/build.zig").Options, options.glfw_options),

Intel Macbook Pro 2019

➜  mach git:(main) stage1 version
0.10.0-dev.2211+f32928c50

➜  mach git:(main) stage1 build run-example-boids
mach: found Metal backend on Integrated GPU adapter: Intel(R) UHD Graphics 630, Metal driver on macOS Version 12.3.1 (Build 21E258)
Frame 60
Frame 120
…
➜  mach git:(main) stage2 build run-example-boids
/Users/jonas/src/zig/mach/build.zig:134:77: error: cannot @bitCast to 'gpu.libs.mach-glfw.build.Options', struct does not have a guaranteed in-memory layout
            .glfw_options = @bitCast(@import("gpu/libs/mach-glfw/build.zig").Options, options.glfw_options),

Wow, nice!

That bitcast warning is interesting ... so the reason the bitcast is there is because we have the same exact code symlink'd into the libs/ directory (so $REPO/gpu/libs/mach-glfw is a symlink to $REPO/glfw) to workaround the fact that dependencies must be below the build.zig file (relative imports not allowed), and since there's not a package manager we can use to avoid this.

Without this, we would have to duplicate all code from glfw/ in this repository (which gets published as a standalone library in a separate repo via some git-fu github.com/hexops/mach-glfw) into the gpu/libs/mach-glfw directory OR make that a Git submodule... to our own repository 🤮

I would hope that there is a way we can tell the Zig compiler still "hey, trust me, I really know what I'm doing - just unsafely cast type A to type B", but maybe not.

For a case like this that would be nice. I don't know how it's supposed to work now. Stage2/3 is moving really fast, so perhaps some cornercase just isn't implemented yet. I wonder if we would get the same error on Linux and Windows?

This is some stuff I found out trying to test mach/glfw with -fno-stage1.

To compile the build.zig we just need to change thisDir() to (comptime thisDir()). This is maybe due to the fact that the stage2 compiler can't infer as well when something should be resolved at comptime or at runtime.

When running zig build test -fno-stage1 we get many errors:

/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:624:29: error: invalid type given to std.mem.Span
                    else => @compileError("invalid type given to std.mem.Span"),
                            ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:669:31: note: called from here
pub fn span(ptr: anytype) Span(@TypeOf(ptr)) {
                              ^

Not sure of the above.

/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:200:24: error: ambiguous reference
    const window = glfw.Window.create(640, 480, "Hello, Zig!", null, null, .{}) catch |err| {
                       ^
/home/Zargio/Documents/Github/mach/glfw/src/main.zig:28:5: note: declared here
pub const Window = @import("Window.zig");
    ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:4:1: note: declared here
const Window = @import("Window.zig");
^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:225:24: error: ambiguous reference
    const window = glfw.Window.create(640, 480, "Hello, Zig!", null, null, .{}) catch |err| {
                       ^
/home/Zargio/Documents/Github/mach/glfw/src/main.zig:28:5: note: declared here
pub const Window = @import("Window.zig");
    ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:4:1: note: declared here
const Window = @import("Window.zig");
^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:242:24: error: ambiguous reference
    const window = glfw.Window.create(640, 480, "Hello, Zig!", null, null, .{}) catch |err| {
                       ^
/home/Zargio/Documents/Github/mach/glfw/src/main.zig:28:5: note: declared here
pub const Window = @import("Window.zig");
    ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:4:1: note: declared here
const Window = @import("Window.zig");
^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:259:24: error: ambiguous reference
    const window = glfw.Window.create(640, 480, "Hello, Zig!", null, null, .{}) catch |err| {
                       ^
/home/Zargio/Documents/Github/mach/glfw/src/main.zig:28:5: note: declared here
pub const Window = @import("Window.zig");
    ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:4:1: note: declared here
const Window = @import("Window.zig");

Probably an error when trying to solve namespacing? Basically we have already const Window = @import("Window.zig"); but we are calling glfw.Window..., the fix is to remove glfw. and just use Window.

/home/Zargio/Documents/Github/mach/glfw/src/internal_debug.zig:10:38: error: unable to resolve comptime value
    if (debug_mode) std.debug.assert(glfw_initialized);
                                     ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:189:37: note: called from here
    internal_debug.assertInitialized();
                                    ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:251:28: note: called from here
    _ = glfw.getProcAddress("foobar");
                           ^

The problem above, is probably due to a similar problem as thisDir() only this time it's the opposite, it tries to resolve at comptime std.debug.assert() but glfw_initialized is a runtime value, (maybe since it's a global value the compiler assumes it should be comptime?). A similar problem happens when building gpu-dawn. That gives this error

/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/compress/gzip.zig:157:11: error: unable to resolve comptime value
    defer gzip_stream.deinit();

Then we have:

/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:140:47: error: expected [*c]const u8, found [:0]const u8
    const supported = c.glfwExtensionSupported(extension);
                                              ^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:268:32: note: called from here
    _ = glfw.extensionSupported("foobar") catch |err| std.debug.print("failed to check if extension supported, error={}\n", .{err});
                               ^

The easy fix for this is to just change c.glfwExtensionSupported(extension) to c.glfwExtensionSupported(extension.ptr), though this still leaves the problem that zig translate-c transforms char* into many item pointers instead of null terminated pointers.

/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:393:13: error: element access of non-indexable type '*allowzero ?*.home.Zargio.Documents.Github.mach.glfw.zig-cache.o.d78c85e96595ebd71edec29885c8bfc7.cimport.struct_GLFWmonitor'
            slice[i] = Monitor{ .handle = monitors[i].? };
            ^
/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:485:32: note: called from here
    const monitors = try getAll(allocator);
                               ^

The allowzero... is the result of a simple
const slice = try allocator.alloc(Monitor, @intCast(u32, count)); that returns a type that the compiler can't interpret as a slice, hence why slice[i] is non-indexable.

/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:167:58: error: expected [*:0]const u8, found *const allowzero u8
    if (c.glfwGetMonitorName(self.handle)) |name| return name;
                                                         ^
/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:548:22: note: called from here
        _ = m.getName();
                     ^

Again, this is a problem of zig translate-c and strings, solved with return @ptrCast([*:0]const u8, name).

/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:241:16: error: element access of non-indexable type '*const allowzero .home.Zargio.Documents.Github.mach.glfw.zig-cache.o.d78c85e96595ebd71edec29885c8bfc7.cimport.struct_GLFWvidmode'
        while (i < count) : (i += 1) {
               ^
/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:589:42: note: called from here
        const modes = try m.getVideoModes(allocator);
                                         ^

Same thing and code as with Monitor and alloc before, this time the line with the error is above the line indexing, so maybe there is also a bug with error messages.

/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:624:29: error: invalid type given to std.mem.Span
                    else => @compileError("invalid type given to std.mem.Span"),
                            ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:669:31: note: called from here
pub fn span(ptr: anytype) Span(@TypeOf(ptr)) {
                              ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:624:29: error: invalid type given to std.mem.Span
                    else => @compileError("invalid type given to std.mem.Span"),
                            ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:669:31: note: called from here
pub fn span(ptr: anytype) Span(@TypeOf(ptr)) {
                              ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:624:29: error: invalid type given to std.mem.Span
                    else => @compileError("invalid type given to std.mem.Span"),
                            ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:669:31: note: called from here
pub fn span(ptr: anytype) Span(@TypeOf(ptr)) {
                              ^

Same as the first error, couldn't figure out where it comes from.

/home/Zargio/Documents/Github/mach/glfw/src/Joystick.zig:606:18: error: no field named 'setUserPointer' in struct 'Joystick.Joystick'
    _ = joystick.setUserPointer;
                 ^
/home/Zargio/Documents/Github/mach/glfw/src/Joystick.zig:1:1: note: struct declared here
//! Represents a Joystick or gamepad
^
/home/Zargio/Documents/Github/mach/glfw/src/Joystick.zig:617:18: error: no field named 'getUserPointer' in struct 'Joystick.Joystick'
    _ = joystick.getUserPointer;
                 ^
/home/Zargio/Documents/Github/mach/glfw/src/Joystick.zig:1:1: note: struct declared here
//! Represents a Joystick or gamepad
^

Same problem with namespacing as glfw.Window probably, the fix is to remove joystic. and use just the function names.

/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:624:29: error: invalid type given to std.mem.Span
                    else => @compileError("invalid type given to std.mem.Span"),
                            ^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:669:31: note: called from here
pub fn span(ptr: anytype) Span(@TypeOf(ptr)) {

Same as the first error.

There is also to note that there may be more errors but that weren't caught since the compiler failed before it could get there

One more issue with compiling to stage2 is that mach-gpu uses a lot of function pointers, but the semantics of fp are changed in stage2. fn (args) ret -> *const fn(args) ret

/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:624:29: error: invalid type given to std.mem.Span
else => @CompileError("invalid type given to std.mem.Span"),
^
/media/data/Projects/Compilers_or_Interpreters/zig/lib/std/mem.zig:669:31: note: called from here
pub fn span(ptr: anytype) Span(@typeof(ptr)) {
^

Update to solve this error, the type given when calling a C function that returns a string is *const allowzero u8, this fails std.mem.span() used in glfw/src/{clipboard.zig,key.zig,main.zig,Joystick.zig, casting the C strings with @ptrCast([*:0]const u8, name) solves the compilation error

/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:393:13: error: element access of non-indexable type 'allowzero ?.home.Zargio.Documents.Github.mach.glfw.zig-cache.o.d78c85e96595ebd71edec29885c8bfc7.cimport.struct_GLFWmonitor'
slice[i] = Monitor{ .handle = monitors[i].? };
^
/home/Zargio/Documents/Github/mach/glfw/src/Monitor.zig:485:32: note: called from here
const monitors = try getAll(allocator);

This was not a problem with alloc(), it was monitors, gotten from a C function, that was the error, same as vidmode.
To fix it we cast the C pointer to this: @ptrCast([*c]const ?*c.GLFWmonitor, monitors) and @ptrCast([*c]const c.GLFWvidmode, modes).

This leaves:

/home/Zargio/Documents/Github/mach/glfw/src/internal_debug.zig:10:38: error: unable to resolve comptime value
if (debug_mode) std.debug.assert(glfw_initialized);
^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:189:37: note: called from here
internal_debug.assertInitialized();
^
/home/Zargio/Documents/Github/mach/glfw/src/opengl.zig:251:28: note: called from here
_ = glfw.getProcAddress("foobar");

as the last error to solve

Solved the last error for compiling mach with glfw, basically _ = glfw.getProcAddress("foobar") was being called at compile time because GLproc, the return type of getProcAddress used the old function declaration.

Will test with the latest version of mach now and send a PR

Found a stage2 bug while compiling gpu/ with -fno-stage1:

(struct {
        pub fn createShaderModule(ptr: *anyopaque, descriptor: *const ShaderModule.Descriptor) ShaderModule {
            @compileLog(descriptor.code.wgsl); // @as([*:0]const u8, [runtime value])
            switch (descriptor.code) {
                .wgsl => |wgsl| {
                    @compileLog(wgsl); // @as([]const u32, [runtime value])
                    const wgsl_desc = c.WGPUShaderModuleWGSLDescriptor{
                        .chain = c.WGPUChainedStruct{
                            .next = null,
                            .sType = c.WGPUSType_ShaderModuleWGSLDescriptor,
                        },
                        .source = wgsl.ptr,
                    };
                    const desc = c.WGPUShaderModuleDescriptor{
                        .nextInChain = @ptrCast(*const c.WGPUChainedStruct, &wgsl_desc),
                        .label = if (descriptor.label) |l| l else null,
                    };
                    return wrapShaderModule(c.wgpuDeviceCreateShaderModule(@ptrCast(c.WGPUDevice, ptr), &desc));
                },
                .spirv => |spirv| {
                    @compileLog(spirv); // Not written to stdout, maybe the branch is optimized away? Or it fails before reaching this?
                    const spirv_desc = c.WGPUShaderModuleSPIRVDescriptor{
                        .chain = c.WGPUChainedStruct{
                            .next = null,
                            .sType = c.WGPUSType_ShaderModuleSPIRVDescriptor,
                        },
                        .code = spirv.ptr,
                        .codeSize = @intCast(u32, spirv.len),
                    };
                    const desc = c.WGPUShaderModuleDescriptor{
                        .nextInChain = @ptrCast(*const c.WGPUChainedStruct, &spirv_desc),
                        .label = if (descriptor.label) |l| l else null,
                    };
                    return wrapShaderModule(c.wgpuDeviceCreateShaderModule(@ptrCast(c.WGPUDevice, ptr), &desc));
                },
            }
        }
    }).createShaderModule,

This is in gpu/src/NaticeInstance,zig, descriptor is defined as:

pub const Descriptor = struct {
    label: ?[*:0]const u8 = null,
    code: union(CodeTag) {
        wgsl: [*:0]const u8,
        spirv: []const u32,
    },
};

The problem here is that the switch uses the wrong type for wgsl

Found out why the union is bugged.

This is how it is now, and also the code that swaps the union types.

pub const CodeTag = enum {
    spirv,
    wgsl,
};

pub const Descriptor = struct {
    label: ?[*:0]const u8 = null,
    code: union(CodeTag) {
        wgsl: [*:0]const u8,
        spirv: []const u32,
    },
};

But by changing the order in CodeTag:

pub const CodeTag = enum {
    wgsl,
    spirv,
};

We get the correct types. I suspect the stage2 compiler uses the order instead of the names to assign the types.

@PiergiorgioZagaria nice work, thanks so much for continuing to dig into this. I updated the issue description just now with what (I think?) the status quo is w.r.t. stage2 support, let me know if I should update anything there

This is a bug I found in mach-freetype.
Basically we have 4 packages:

const c_pkg = std.build.Pkg{
    .name = "c",
    .source = .{ .path = thisDir() ++ "/src/c.zig" },
};

const utils_pkg = std.build.Pkg{
    .name = "utils",
    .source = .{ .path = thisDir() ++ "/src/utils.zig" },
};

pub const pkg = std.build.Pkg{
    .name = "freetype",
    .source = .{ .path = thisDir() ++ "/src/freetype/main.zig" },
    .dependencies = &.{ c_pkg, utils_pkg },
};

pub const harfbuzz_pkg = std.build.Pkg{
    .name = "harfbuzz",
    .source = .{ .path = thisDir() ++ "/src/harfbuzz/main.zig" },
    .dependencies = &.{ c_pkg, utils_pkg, pkg },
};

The problem is that when trying to compile without adding utils_pkg to our executable like this:

  const main_tests = b.addTest("test/main.zig");
   main_tests.setBuildMode(mode);
   main_tests.setTarget(target);
   main_tests.addPackage(c_pkg);
   main_tests.addPackage(pkg);
   link(b, main_tests, .{
       .freetype = .{
           .ft_config_path = "./test/ft",
           .brotli = true,
       },
   });

We get the error:

/mach/freetype/src/utils.zig:1:21: error: unable to open 'std': PackageNotFound
const std = @import("std");

Adding main_tests.addPackage(utils_pkg) and using:

comptime {
    _ = @import("utils");
}

To actually import it (without it the compiler doesn't evaluate because of lazy evaluation), seems to solve the problem.
Could this be related to ziglang/zig#9204

aarch64-macos now mostly working.

Workarounds employed:

  • Our map-async example (which does not use zig async) required adding a useless field to the App file struct (*App is treated as *const App for some reason otherwise, not sure why. Need to make a minimal repro and file a bug. error: expected type '*main.main', found '*const main.main') a4ddfb6
  • pre-translated @cImports to workaround ziglang/zig#12483 (e.g. 80e127b and 5193224 )
  • @embedFile seems to have regressed? bc5e2fe
  • Removing unexpected symbol from @cImport/translated file manually:
error: expected ';' after declaration
pub const MPCopyrightNotice = "Copyright � 1995-2020 Apple Computer, Inc.\n";
                                         ^

Known issues

Failing even with stage1 and -fstage1!

  • image-blur fails with obscure error: FileNotFound (even with -fstage1!)
  • textured-cube fails with obscure error: FileNotFound (even with -fstage1!)
  • gkurve doesn't work (depends on freetype which has @cImport issues)
  • WASM examples zig build run-example-triangle -Dtarget=wasm32-freestanding-musl

Failing with self-hosted/stage3

  • ecs-app crashes (compiler bug, need to file)
  • image-blur fails with obscure error: FileNotFound (even with -fstage1!)
  • textured-cube fails with obscure error: FileNotFound (even with -fstage1!)
  • gkurve doesn't work (depends on freetype which has @cImport issues)
  • Still has @cImport issues:
    • sysaudio
    • freetype
  • Need to limit @cImport workaround to Darwin hosts
  • When exiting applications, there is a crash freeing GLFW cursors (seems legit)
  • WASM examples not working (apple_pie needs updating and relies on async/await): zig build run-example-triangle -Dtarget=wasm32-freestanding-musl
    • Even with -fstage1 on aarch64-macos host, it doesn't work. Obscure error: FileNotFound

Confirmed working

  • All Mach examples
  • 137 mach/glfw tests
  • mach/gpu example
  • shaderexp

image

image

image

image

image

image

image

apple_pie has since been replaced with a web server dedicated to WASM serving: https://github.com/hexops/mach/tree/main/tools/wasmserve

We're now fully on the self-hosted compiler, so closing.