zgpu: Memory leak in samples (Dawn leaks ~20KB per frame on macOS)
Pyrolistical opened this issue ยท 27 comments
It seems like all the samples have are leaking memory. One can detect this by observing the memory usage of the sample goes up over time.
zig version: 0.10.0-dev.4249+11dce7894
$ zig build triangle_wgpu-run
info: [zgpu] High-performance device has been selected:
info: [zgpu] Name: Apple M1
info: [zgpu] Driver: Metal driver on macOS Version 12.6 (Build 21G115)
info: [zgpu] Adapter type: discrete_gpu
info: [zgpu] Backend type: meta
Here is a capture of the triangle sample's memory usage captured once per second:
48 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
95216 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
96368 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
97424 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
98512 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
99504 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
100528 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
101552 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
102544 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
103584 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
104576 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
105584 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
106656 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
107696 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
108720 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
109712 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
110800 michal-z/zig-gamedev/zig-out/bin/triangle_wgpu
In Activity Monitor memory usage seems to go up by 40 megabytes per second.
I commented out zgui and the leak remains.
The oldest commit that still runs the triangle demo for me is c0fcbfd1484d7d943248153533b2e553df291085
and it also leaks memory. I wonder if its env specific. I don't have access to another computer right now.
The memory leaks is faster when the sample isn't being showing on screen. This is likely caused by an extremely high frame rate being off-screen and the memory leak is made more obvious.
I created a memory-leak branch to run the triangle sample with std.testing.allocator
to see if it catches the problem, but it does not.
Run zig build test
and it will launch the triangle sample with std.testing.allocator
.
This is a bug in Dawn and it affects only macOS. This should be fixed after we update to newer Dawn versions. There was a discussion about this bug on Discord.
Thanks. I found it in the issue tracker https://bugs.chromium.org/p/dawn/issues/detail?id=1175
Which discord held this discussion? Dawn doesn't have a discord
We were discussing this issue on Zig game-dev Discord channel: https://discord.com/channels/605571803288698900/634812978994085888/1008307672484954213
Just re-checking. Still leaks memory with
- zig-gamedev 5d64954
- zig 0.11.0-dev.78+28288dcbb
Thanks for the info.
Is there any new news on this? I just had my machine crash after the program used all available app memory for the first time, this bug is still definitely there.
- zig-gamedev de7b2bc
- zig 0.11.0-dev.2191+30427ff79
Yes, the bug is still there unfortunately. They have decided to fix it after WebGPU 1.0 release (which should happen this Summer).
Gotcha! Thanks for the info!
@michal-z I got a notification that this was fixed I believe, looking at the dawn issue tracker it shows fixed as of 2 days ago. What are the steps to get this into zig-gamedev? I'm itching to get rid of that leak on macOS :D
That's a good news! We need to re-compile and update our Dawn binaries for all platforms. I'm planning to do it in coming weeks.
Any update on this?
Yes and also if you need any help with anything! I started trying to look at it locally using the latest mach prebuilt but I was getting a lot of errors I didn't know how to solve, but I could look into that further if it would help.
Yeah, I need to do it. Sorry for the delay. Will try early next week.
Coincidentally our project recently crossed a threshold of usage where this now affects us as well. Previously our app was meant to be used only used for a few minutes at a time, and this didn't break things, now it needs to be run longer.
as of
commit eb355bb3edcf7f667597bae78f1be93c70e6297a (HEAD -> main, origin/main, origin/HEAD)
Author: Austin Eng <enga@chromium.org>
Date: Fri Jul 14 01:10:52 2023 +0000
it still leaks, but the rate on my test app has gone from 20mb/s to a few kb/s. So it's gone from unusable to tolerable.
Note also that the dawn build was broken for a couple of days, there was some defunct codegen which they fixed today. The hash I posted here is viable. Hope that saves some time if you are experimenting with an update.
I spoke with @slimsag briefly and apparently mach-gpu
/mach-gpu-dawn
doesn't have this issue, which I find confusing. He didn't get into much detail but seemed confident that this leak isn't present in mach-core
. Maybe some insight can be gained from that?
mach
uses the same lib (Dawn
) so if a leak is there it will affect every build. That said, mach
may be using different version than @meshula .
Anyway, I will start working on the update today.
I tested with dawn's top of tree as of July 14 and saw the bug is not fixed but greatly reduced as reported above. The issue is reproducible for us with the ziggamedev demos, with the version of zgpu currently referenced by ziggamedev, and all the related zgd demos demonstrate the leak on the several machines we tried, all M1 or M2 based. It would surprise me if anyone is immune to the problem, because the dawn examples dawn itself ships show the problem.
@meshula have you reported your findings to google? You can refer to the fix https://bugs.chromium.org/p/dawn/issues/detail?id=1175
I think it would make sense to do so after zgpu is updated to latest. Otherwise I am only one data point, and I would be happy to discover that after the zgpu update the problem has been resolved.
Just to let you know, I'm still working on Dawn update, some changes in zgpu
are needed so it may takes few more days.
I've updated Dawn binaries to the same version that Mach uses. We no longer use submodule to fetch binaries - we use Zig package manager instead. You will need to add build.zig.zon
to your project. Please see zgpu
docs for the details. I've tested this change on Windows and Linux.
@Pyrolistical @meshula We now have new binaries and @foxnne doesn't see memory leak on macOS anymore. I highly recommend to upgrade.
Thanks! We'll upgrade as soon as possible.
Can confirm leak is gone. Thank you.