Question: Does the gc execute before the `proxywasm.DispatchHttpCall` callback executes ?

Question

Question: Does the gc execute before the `proxywasm.DispatchHttpCall` callback executes ?

sandeeppnaik opened this issue 8 months ago · 20 comments

I am running into a strange behaviour when my wasm plugin executes a DispatchHttpCall. It appears the call to the service (dispatch call) is never made in majority of the runs, making me think if the gc clears the context before the execution of the dispatch call.

The flow works as expected when i don't run the custom gc
Req -> parseHeaders -> dispatchCall(ServiceA) -> callback -> getResponseHeaders -> update headers -> ReqGoesUpstream

The metrics below measure the execution_count(execution of the plugin) & the success_count(completion of callback). Below are the numbers when I don't run the custom gc - execution_count = success_count.

Whereas when i run the custom gc, only a portion of requests end up making a call upstream (I verified that requests don't reach serviceA)

My main.go has these imports

import (
	"crypto/rand"
	"fmt"
	"math/big"
	"os"
	"strings"
	"time"

	"github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm"
	"github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm/types"
	"github.com/tidwall/gjson"
	_ "github.com/wasilibs/nottinygc"
	"golang.org/x/exp/slices"
)

And the file ends with this after all the functions

// required for nottinygc
//
//export sched_yield
func sched_yield() int32 {
	return 0
}

Appreciate your thoughts on this behaviour , and thank you for this package :)

Answer 1 · 2023-10-22T09:53:27.000Z

Hi @sandeeppnaik - would it be possible to get a reproduction showing the issue? If there's a repo with a command to build the plugin and a docker compose command to start two envoys to show the problem, that would be very helpful. If not, I'll try to set it up but it may take some more time.

Nothing immediately comes to mind as a potential problem so will need to debug a repro. For the question in the title specifically, GC can happen with any allocation, and there are indeed allocations happening during that call. Though that should be ok in principle.

Answer 2 · 2023-10-22T11:43:58.000Z

Hey @anuraaga , I have added the plugin code on 2 branches

compile with customgc - processes the dispatch only for a portion of requests
compile without custom gc - processes dispatch for all requests

You will need a dummy server with an endpoint for the plugin to return success.

and there are indeed allocations happening during that call -> Does the location of the stub method in the code matter at all ?

Answer 3 · 2023-10-23T02:54:03.000Z

Thanks for the report and repro @sandeeppnaik, I found an issue with the recent addition of precise GC. v0.6.0 includes the fix and trying it with your repro, it seems to be working well. Let me know if you still have any issues.

Answer 4 · 2023-10-23T05:26:44.000Z

Thanks for looking into this promptly ! I am running v.0.6.0 now. I can definitely see that all the dispatch calls are working as expected. Question about the garbage collection
Here, the first bump beginning at 15:40 is without GC v/s the second bump at 16:02 with cusotm GC>

THe ramp up of mem when running custom GC is better than without it. However I would expect the second bump to go down over time and free up the memory, although not seeing that. Any idea why @anuraaga ?

Answer 5 · 2023-10-23T05:31:03.000Z

@sandeeppnaik IIUC, this is a graph of real memory usage of the Envoy process. WebAssembly currently does not define any way to return memory to the OS, so any memory that has been allocated will stay allocated forever. Environment variables like GC_PRINT_STATS can provide information about how much of the allocated memory is actually in use by objects if that is what you are interested in

https://github.com/ivmai/bdwgc#installation-and-portability

and the hope is that when the process stabilizes it will efficiently use the memory it has already allocated from the OS. But unfortunately due to the fundamental limitation of wasm, there is no way or workaround to allow unused memory to ever be released completely.

Answer 6 · 2023-10-23T05:39:57.000Z

this is a graph of real memory usage of the Envoy process -> yep this was envoy's memory usage. Aah i understand, so the plugin never returns the memory back to the proxy but tends to use whatever is allocated. If that is the case, the memory usage of the plugin has to be consistent for a consistent load pattern (sustained throughput of x rps over a period of stability when there is no load between 2 consecutive load tests) right ? Although I see it is ever growing until it hits the proxy's limit mem.

{"level":"warn","time":"2023-10-23T05:47:12.025635Z","msg":"Envoy may have been out of memory killed. Check memory usage and limits."}
{"level":"error","time":"2023-10-23T05:47:12.025679Z","msg":"Envoy exited with error: signal: killed"}
{"level":"error","time":"2023-10-23T05:47:12.025715Z","msg":"error serving tap http server: http: Server closed"}

Answer 7 · 2023-10-23T05:52:11.000Z

It would be good to confirm it is the memory within a VM going up and not the number of VMs itself, Envoy has metrics wasm.envoy.wasm.runtime.v8.active and wasm.envoy.wasm.runtime.v8.created to show the number of VMs. It could be an issue with GC but need to confirm the VMs, unfortunately there isn't a per-VM memory usage stat yet though.

Answer 8 · 2023-10-23T06:47:18.000Z

I noticed both of the above metrics stay constant throughout the load test while the memory spikes

curl http://localhost:15090/stats/prometheus | grep v8
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# TYPE envoy_wasm_envoy_wasm_runtime_v8_created counter
envoy_wasm_envoy_wasm_runtime_v8_created{} 10
100 # TYPE envoy_wasm_envoy_wasm_runtime_v8_active gauge
9envoy_wasm_envoy_wasm_runtime_v8_active{} 12

Interestingly, in the previous version nottinygc v0.5.1 the gc does not spike. But not entirely sure if that is because it prevents the plugin from making the dispatch call or not

Answer 9 · 2023-10-23T07:19:01.000Z

Thanks for confirming - I'm assuming the plugin you are testing against isn't the same as in https://github.com/sandeeppnaik/wasm-test/tree/main? FWIW, I can't reproduce increasing memory usage locally using that repo - it might be harder to debug issues without a closer reproduction in that case. Other things to try are running with GC_PRINT_STATS environment variable set and see if nottinygc v0.4.0 has any different behavior.

Answer 10 · 2023-10-23T08:46:20.000Z

Hi @sandeeppnaik - I got another report with a potential fix

#31 (comment)

Perhaps if you get a chance you could try the 667d0d4 commit if it causes a difference. Thanks.

Answer 11 · 2023-10-23T11:15:03.000Z

Unfortunately, still seeing the same behaviour of mem spiking with this @anuraaga. My plugin is using pretty much the above example. May i know what throughput you are testing with ? When I run at about ~600rps, I can reproduce memory spikes.

Also tested with 0.40, it seems to have the same behaviour.

Tested this too-> github.com/higress-group/nottinygc v0.0.0-20231019105920-c4d985d443e1 seems to stop that mem leak. This seems to prevent the mem leak !

Answer 12 · 2023-10-23T12:05:37.000Z

Thanks for the confirmation - it seems like the issue with malloc affects callbacks more strongly than other apps. I will go ahead and incorporate that in soon

Answer 13 · 2023-10-23T13:05:27.000Z

#34 has a new build tag for Envoy plugins, would you be able to try the f38ad42 commit and the build tag nottinygc_proxywasm (also can remove sched_yield stub)?

Answer 14 · 2023-10-23T21:19:44.000Z

Getting a panic while compiling the binary tinygo build -o main.wasm -gc=custom -tags=nottinygc_proxywasm -scheduler=none -target=wasi main.go

panic: cannot convert pointer value to byte

goroutine 197 [running]:
github.com/tinygo-org/tinygo/interp.rawValue.bytes(...)
        /Users/runner/work/tinygo/tinygo/interp/memory.go:743
github.com/tinygo-org/tinygo/interp.rawValue.Uint(...)
        /Users/runner/work/tinygo/tinygo/interp/memory.go:751
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2c640, {0xc008e01690, 0x4, 0x0?}, 0xc009b68720, {0xc006a3b3c0, 0x1c})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:828 +0x21f6
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d4f0, {0xc008e00b10, 0x4, 0x0?}, 0xc009b68360, {0xc007114900, 0x18})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d4a0, {0xc005d70f90, 0x1, 0x0?}, 0xc009b55f20, {0xc0071148a0, 0x14})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d450, {0xc0047f8e10, 0x2, 0x0?}, 0xc009b55e90, {0xc006cd4ac0, 0x10})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d400, {0xc008d9b390, 0x2, 0x0?}, 0xc009b55b30, {0xc006cd47a0, 0xc})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d3b0, {0xc008d9a290, 0x1, 0x0?}, 0xc009b554d0, {0xc006c4b2d8, 0x8})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.(*runner).run(0xc00910dd60, 0xc009f2d360, {0x0, 0x0, 0x100066faa?}, 0x0, {0x104a4b0a0, 0x4})
        /Users/runner/work/tinygo/tinygo/interp/interpreter.go:526 +0x7ccc
github.com/tinygo-org/tinygo/interp.Run({0x110f05240?}, 0x29e8d60800?, 0x0?)
        /Users/runner/work/tinygo/tinygo/interp/interp.go:117 +0x67e
github.com/tinygo-org/tinygo/builder.optimizeProgram({0x110f04080?}, 0xc000024240)
        /Users/runner/work/tinygo/tinygo/builder/build.go:1046 +0x35
github.com/tinygo-org/tinygo/builder.Build.func5(0xc00907d9e0)
        /Users/runner/work/tinygo/tinygo/builder/build.go:576 +0x61d
github.com/tinygo-org/tinygo/builder.runJob(0xc00907da40, 0x0?)
        /Users/runner/work/tinygo/tinygo/builder/jobs.go:222 +0x4d
created by github.com/tinygo-org/tinygo/builder.runJobs in goroutine 1
        /Users/runner/work/tinygo/tinygo/builder/jobs.go:123 +0x5bd

Answer 15 · 2023-10-24T00:20:25.000Z

@sandeeppnaik I'm not sure if it would cause that crash but the invocation seems to be missing the custommalloc build tag, also good to confirm you've removed the sched_yield function from your own code.

Answer 16 · 2023-10-24T03:19:07.000Z

Thanks for writign back @anuraaga . My bad, added this tinygo build -o main.wasm -gc=custom -scheduler=none -target=wasi -tags="custommalloc nottinygc_proxywasm" main.go.

Works like a charm ! Confirming there is no memory leak at high load, and I am not implementig the stub method 👍

Answer 17 · 2023-10-24T06:05:24.000Z

@sandeeppnaik I have released a new version with the changes. Warning that I ended up naming the tag nottinygc_envoy in the end. Hopefully this works fine now and thanks for the patience.

Answer 18 · 2023-10-25T12:31:54.000Z

Hey @anuraaga , noticing a behaviour where I am incrementing a counter metrics just prior to returning from the call back / ResumeHttpRequest where I see those metrics never incrementing. Is this a situation of very eager garbage collection ? I dont see the same behaviour with the higress version. Any thoughts ?

Answer 19 · 2023-10-25T12:41:34.000Z

We generally wouldn't expect GC to have an effect since there are no allocations when incrementing a metric.

https://github.com/tetratelabs/proxy-wasm-go-sdk/blob/main/proxywasm/hostcall.go#L759

But you never know, would need a repro to debug it.

Answer 20 · 2023-10-26T06:10:55.000Z

I have added an e2e test that is based off of your wasm-test

https://github.com/wasilibs/nottinygc/tree/main/e2e/envoy-dispatch-call
https://github.com/wasilibs/nottinygc/blob/main/magefiles/e2e.go#L68

Checking counters after a minute of running they seem to be stable, but perhaps you're seeing issues after longer runtime or maybe there are some differences with the test case? If you have any suggestions for that e2e test, a PR would be great. Thanks.