bytecodealliance/wasmtime-go

memory usage change in 0.37.0?

srenatus opened this issue ยท 9 comments

Hey there. I've found that by updating 0.37.0, we've got a benchmark that consistently fails with "signal: killed":
https://github.com/open-policy-agent/opa/runs/6932889970?check_suite_focus=true#step:3:416

Are there any known issues around memory use in 0.37.0? Anything I can help to debug this? ๐Ÿ˜•

In my experience on CI a "killed" signal typically means "OOM and killed by the OOM killer", so I agree that the thing to look at here is memory usage. We have one known issue that was fixed in 0.38.0 (not released yet) and was introduced in 0.37.0 where stack frames for wasm functions might be much larger than before. I'm not sure if that would alone be enough to trigger the OOM killer though.

Are you able to reproduce this locally? Perhaps not the kill signal but could you capture the max rss for 0.36.0 and 0.37.0 and see if they're egregiously different? If that's the case it would be helpful to try to dig into where all that rss is going with a heap profiler or similar.

Thanks for your response!

I've not been able to repro the "killed" signal locally, but running the test through time -v (with GNU time) gives me this with wasmtime-go 0.37.0:

	Command being timed: "go test -tags=opa_wasm,slow -timeout 30m -run=- -bench=. -benchmem ./internal/wasm/sdk/opa"
	User time (seconds): 128.44
	System time (seconds): 39.18
	Percent of CPU this job got: 114%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:25.76
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6800804
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 43479
	Minor (reclaiming a frame) page faults: 7198773
	Voluntary context switches: 39929
	Involuntary context switches: 3209555
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 44998
	Page size (bytes): 4096
	Exit status: 0

With wasmtime-go 0.36.0:

	Command being timed: "go test -tags=opa_wasm,slow -timeout 30m -run=- -bench=. -benchmem ./internal/wasm/sdk/opa"
	User time (seconds): 94.78
	System time (seconds): 11.87
	Percent of CPU this job got: 125%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 1:24.99
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 535888
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 43541
	Minor (reclaiming a frame) page faults: 723264
	Voluntary context switches: 40714
	Involuntary context switches: 3046118
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 44440
	Page size (bytes): 4096

Insightful? ๐Ÿค” Well I guess max RSS is quite a difference: 535M vs 6G?

Yes a jump to 6G RSS would definitely cause SIGKILL in github actions I think (which has more limited ram than you probably do locally).

As to the question of where this memory is going, do you know if Go has a heap profiler which could help here? Even if Go has one it may not be too useful since most of the allocation here is probably in Rust which may be hidden from Go, but if Go intercepts malloc, free, and maybe mmap that could be a good start.

Also, if possible, could you try testing out with a build of Wasmtime from main? I think 0.38.0 is scheduled to be released tomorrow so if it's difficult to use a git version waiting is probably also ok.

Thanks!

I think go's profiling will miss out on what happens in the C lib, since it's also not accounted for in the benchmark results there. But trying out 0.38.0 when released is simple enough, I'll do that if I don't get to testing with main toady. ๐Ÿคž

So I've pulled down wasmtime-dev-x86_64-macos-c-api.tar.xz from dev release (337c1ca), put it into vendor/github.com/bytecodealliance/wasmtime-go/build/macos-x86_64/libwasmtime.a, and ran the benchmark command -- It gives me this:

	Command being timed: "go test -tags=opa_wasm,slow -timeout 30m -run=- -bench=. -benchmem ./internal/wasm/sdk/opa"
	User time (seconds): 151.16
	System time (seconds): 38.37
	Percent of CPU this job got: 149%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:06.47
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 6793280
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 23207
	Minor (reclaiming a frame) page faults: 6763319
	Voluntary context switches: 38198
	Involuntary context switches: 3154536
	Swaps: 0
	File system inputs: 0
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 44230
	Page size (bytes): 4096
	Exit status: 0

Which looks like almost no difference. I'm not sure if I've done something wrong, though ๐Ÿ™ƒ

I've done a bit more testing locally and I've bisected this to bytecodealliance/wasmtime#3989 and the main branch of Wasmtime indeed does not fix things so this isn't bytecodealliance/wasmtime#4222 resurfacing. That means you probably won't want to update to 0.38.0 yet either since it will exhibit the same behavior as 0.36.0. This pattern looks like a memory leak, although more investigation is still needed to figure out why a register allocator update would trigger that.

Ok to the best of my ability to determine what's going on here I believe the issue here is bytecodealliance/wasmtime#4291. The wasm module in question that you're compiling is taking 6G+ of memory to compile in Wasmtime itself. I couldn't actually find any leaks but I believe it's unexpected that this module takes 6G or more of memory to compile.

The specific function in question in the wasm input is g0.data.a.b.c.allow. I don't personally know enough about how opa is setup and where that function is coming from (it looks like y'all are generating wasm files so it seems like it may not correspond to source). While this is definitely something to fix on our end the best workaround for now would probably be to make that function smaller if possible. Otherwise though it's probably best to stick to 0.36.0 until the issue is resolved in Wasmtime and likely released with 0.39.0

Thank you so much for digging in. No bandwidth to change how functions are generated, I'm afraid -- we'll have to pass on upgrading until 0.39.0. It's probably unlikely to come up as a problem for any real policy, though: those benchmark cases are a bit on the extreme. So, I'm hoping it won't negatively affect anyone too much when using OPA's wasm modules with wasmtime 0.37.0/0.38.0.

Ok I'm going to close this with bytecodealliance/wasmtime#4291 being fixed now, and this'll get published with 0.39.0