foxnne/pixi

[macOS] Window freezes and flashes magenta unpredictably on random file loads

Closed this issue ยท 9 comments

foxnne commented

This is a long-standing issue since moving to zig-gamedev, I'm unsure of the cause and it seems to be rather random, some file loads trigger it and some don't, and not the same files every time. It can be forced to happen by loading many files, i.e. packing a full project.

I suspect that the issue is related to this, however, I'm unable to verify yet until the dawn lib is updated.

I did try to debug the application using Xcode, which did at least reveal the following error messages:

2023-07-11 13:09:07.434221-0500 Pixi[71341:647781] Metal GPU Frame Capture Enabled
2023-07-11 13:09:07.434349-0500 Pixi[71341:647781] Metal API Validation Enabled
info: [zgpu] High-performance device has been selected:
info: [zgpu]   Name: Apple M2
info: [zgpu]   Driver: Metal driver on macOS Version 13.4.1 (Build 22F82)
info: [zgpu]   Adapter type: discrete_gpu
info: [zgpu]   Backend type: metal
2023-07-11 13:09:18.445450-0500 Pixi[71341:647781] +[CATransaction synchronize] called within transaction
2023-07-11 13:09:19.581168-0500 Pixi[71341:647781] [default] CGSWindowShmemCreateWithPort failed on port 0
2023-07-11 13:09:45.459204-0500 Pixi[71341:648487] Execution of the command buffer was aborted due to an error during execution. Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)
...
2023-07-11 13:09:45.465631-0500 Pixi[71341:648487] Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
foxnne commented

After moving to mach-core which at the time was the only way to get an updated dawn binary, this issue became nearly solved. The massive memory leak we had prior to that in Dawn is now seemingly gone, or at least, very very much reduced.

However, the bug remains. Its way harder to predict now, as I can spam the Pack Project button, loading several files at once and packing everything and releasing the files, and the issue will not occur for a large number of attempts. I have not been able to reproduce it other than knowing it eventually happens on a file load.

I have distributed debug statements through the loading function, and observed that when I did eventually trigger the bug, all debug statements were still written, leading me to believe that its still Dawn having the issue on macOS.

As much as I would love to fix this now, I just do not know where to begin to do so. I have spoken with slimsag and it seems the initial leak was not present in his testing on an older macOS version. This bug could be something that only happens on the OS version I'm on (13.4.1) with Dawn.

foxnne commented

I'm fairly certain this has to do with Dawn and other users are experiencing similar issues here: zig-gamedev/zig-gamedev#411

As much as I would love to fix this now, I just do not know where to begin to do so. I have spoken with slimsag and it seems the initial leak was not present in his testing on an older macOS version. This bug could be something that only happens on the OS version I'm on (13.4.1) with Dawn.

I also saw the magenta flashing on macOS 13.4.1. But my current memory freeze problems are on macOS 14.0. I get no flashing there, just a complete freeze that forces me to reboot (see zig-gamedev/zig-gamedev#411).

Update: If I run videostreams or some memory-intensive things in the background, the glfw apps does magenta flash on macOS 14.0, too.

foxnne commented

Update: If I run videostreams or some memory-intensive things in the background, the glfw apps does magenta flash on macOS 14.0, too.

Not sure if you have a handy way of testing, but I've noticed the behavior for me mainly only happens when I'm loading files. I think that's similar to what you describe. In Pixi I can basically trigger a file load on a button press, and its decently easy to repeat the behavior using that.

Update: After updating to Sonoma 14.1.2, it seems this issue is worse. I experience the magenta screen and hard freeze far more frequently. I spoke with pdoane and it seems this issue is not present in sysgpu, mach's answer to Dawn. I believe there are a few blockers here before Pixi can use sysgpu but I'll be swapping over as soon as possible.

Update: After updating to Sonoma 14.1.2, it seems this issue is worse. I experience the magenta screen and hard freeze far more frequently. I spoke with pdoane and it seems this issue is not present in sysgpu, mach's answer to Dawn. I believe there are a few blockers here before Pixi can use sysgpu but I'll be swapping over as soon as possible.

Thanks! I just updated to 14.1.2, haven't checked yet. Interesting that sysgpu is unaffected!

I've created a sysgpu branch that uses a set of compatible generated Imgui bindings. I'm really hoping to have that swapped over tomorrow, but it's a ton of changes to swap Imgui bindings. Anyway, when that's done it will be a good test to make sure that issue isn't present in their implementation, and if it works well on all platforms I'll merge it into the main branch and finally close this issue. Its been driving me crazy trying to draw some assets with all the constant freezes.

Good news! I've completed the painful task of switching Imgui bindings, which allows us to easily swap between Dawn and sysgpu backends. I can confirm I have had zero freezes/crashes with sysgpu. I will keep the sysgpu branch separate and develop there however until a single bug gets ironed out which causes broken Windows builds. Once that is merged, I'll merge the sysgpu branch into main and we can finally close this awful issue. :)

I've merged the sysgpu branch into main. Currently there are a few misalignments in the development of pixi and mach-sysgpu/mach-core, so it's not set as default yet. However, you can easily enable it by zig build run -Duse_sysgpu=true. We will be frozen on this current version of mach-core until sysgpu is re-enabled.