Memory leak exists, should use GC_malloc_ignore_off_page when alloc large object.

Hi @anuraaga, thanks for your project, we are using this nottinygc in our higress project which based on Envoy's wasm extension mechanism.

We discovered a memory leak during use, and you can find demo code to reproduce the memory leak here:
https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/gc-test

Note here that we replace nottinygc with github.com/higress-group/nottinygc, if we comment out this replacement like this:

...
// replace github.com/wasilibs/nottinygc v0.5.1 => github.com/higress-group/nottinygc v0.0.0-20231019105920-c4d985d443e1

require (
	github.com/alibaba/higress/plugins/wasm-go v0.0.0
	github.com/tetratelabs/proxy-wasm-go-sdk v0.22.0
	github.com/tidwall/gjson v1.14.3
)
...

Then build the wasm file and start envoy with the following configuration:

node:
  cluster: test-cluster
  id: test-idn

admin:
  address:
    socket_address: { address: 127.0.0.1, port_value: 9901 }

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 127.0.0.1, port_value: 10000 }
    per_connection_buffer_limit_bytes: 1024000000
    filter_chains:
    - filters:
      - name: envoy.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
          http_filters:
          - name: gctest
            typed_config:
              "@type": type.googleapis.com/udpa.type.v1.TypedStruct
              type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
              value:
                config:
                  name: basic_auth
                  vm_config:
                    runtime: envoy.wasm.runtime.v8
                    code:
                      local:
                        filename: ./mem.wasm
                    allow_precompiled: true
                  fail_open: true
                  configuration:
                    "@type": type.googleapis.com/google.protobuf.StringValue
                    value: '{"bytes": 10485760}'
          - name: envoy.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains:
              - "*"
              routes:
              - match:
                  prefix: /
                direct_response:
                  status: 200
                  body:
                    inline_string: "hello world"

Use the following command to perform a stress test:

wrk -R 50 -t 10 -c 10 -d60s -s post.lua http://localhost:10000/

After a few seconds, we can see the following log:

[2023-10-23 15:42:30.858][1902490][info][wasm] [source/extensions/common/wasm/context.cc:1204] wasm log basic_auth: [gc-test] {"Sys": 1063452672,"HeapSys": 1058013184,"HeapIdle": 145313792,"HeapInuse": 0,"HeapReleased": 0}
[2023-10-23 15:42:30.861][1902490][error][wasm] [source/extensions/common/wasm/wasm_vm.cc:41] Function: proxy_on_request_headers failed: Uncaught RuntimeError: unreachable\nProxy-Wasm plugin in-VM backtrace:\n  0:  0x1d4fc - runtime._panic\n  1:  0x1ce84 - runtime.alloc\n  2:  0x26758 - main.onHttpRequestHeaders\n  3:  0x3761a - proxy_on_request_headers

Thanks for the report @johnlanni - to confirm, are you saying you don't see the issue with the patch in your branch of nottinygc (https://github.com/higress-group/nottinygc/pull/1/files) or you do see it with that as well? I'm currently running locally first with the code in that branch, using higress-group/nottinygc and see the issue as well so just want to confirm that.

We resolved the issue in github.com/higress-group/nottinygc with these changes: https://github.com/higress-group/nottinygc/pull/1/files.

There are two keypoints:

Use GC_malloc_ignore_off_page instead of GC_malloc when alloc large objects which follow the recommend of Bdwgc:
Do not export the malloc symbol in init.go. Because proxy-wasm will find this symbol when load the wasm, and envoy will use this function to alloc memory in wasm vm:
https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/b7e690703c7f26707438a2f1ebd7c197bc8f0296/src/wasm.cc#L169-L172
https://github.com/proxy-wasm/proxy-wasm-cpp-host/blob/b7e690703c7f26707438a2f1ebd7c197bc8f0296/include/proxy-wasm/wasm.h#L403
This symbol is bound to mi_malloc, bypassing GC_malloc in bdwgc, so GC will not work.

If the "malloc" symbol is not exported, proxy wasm will use the "proxy_on_memory_allocate" symbol. The symbol is correctly bound to the function when using the tetrate sdk: https://github.com/tetratelabs/proxy-wasm-go-sdk/blob/1b9daaf70730bd6197ca8a34b2a8af713bbce7ad/proxywasm/internal/abi_callback_alloc.go#L20

Thanks for the pointers, and sorry I had an issue with my build command, I'm able to verify the fix. For me, only ignore_page seems to be enough to not see an issue, I will start with it and thinnk more about malloc. AFAIK, Envoy uses malloc to allocate static memory so it should not need to be GC'd (languages without a GC would basically not be able to implement it otherwise), in fact it could cause worse performance for the GC otherwise.

For the first change I sent #32, perhaps you can try using commit 667d0d4 to see if you also see the same behavior as me, that it is enough to fix it in your repro? Thank you.

@anuraaga Envoy needs to use malloc in the wasm module when copying memory from the host to the wasm vm, such as copying the request body to the memory section of the wasm module. This memory should be managed by the wasm vm, not Envoy itself. So the second change is needed, otherwise, this copid memory will leak.

Thanks @johnlanni - I also found https://prog.world/webassembly-will-bring-them-all-together/ which mentions it. That's a surprising model 😅 In that case it does seem like the only option is to stop exporting malloc, I will do some more checking on it and then make the change.

@anuraaga Yes, this is a bit counterintuitive. But because envoy doesn't know when the wasm module is not using the copied buffer. So it can only be released by the wasm module itself.

I have another question that is a bit confusing. Why do we need to introduce mimalloc after using bdwgc? Bdwgc doesn't really hand back the memory to OS, right? So the memory pool in mimalloc is not used?

A bit unsure what the difference is, but it is recommended by bdwgc and seems to make a big
difference in some apps.

My understanding is that when using GC_malloc, bdwgc will consider that any section in the allocated memory will be refrenced by pointers, so the larger the allocated memory, the greater the possibility of being mistaken for a potential pointer to generate a reference.

See: https://github.com/ivmai/bdwgc/blob/242a3a7b6040c2680e009367e96a77b0e167619a/README.md?plain=1#L106-L128

I have another question that is a bit confusing. Why do we need to introduce mimalloc after using bdwgc?

This is inherited from original exploration where it improved performance in apps that use malloc a lot like coraza-proxy-wasm. Later we added bdwgc and I think it was the far more significant improvement so I have been wondering if dlmalloc will perform fine and want to compare it. Mimalloc may still have general improvements besides pooling though.

My understanding is that when using GC_malloc, bdwgc will consider that any section in the allocated memory will be refrenced by pointers,

Since 0.5, we switched from GC_malloc to _typed, which indicates which parts may be pointers, and most importantly that there aren't pointers in byte arrays. Notably, there is a version of _typed to ignore off page, so it appears that they are orthogonal issues. I'll try to look more at the code in bdwgc to see if I can understand the difference.

Hi @anuraaga, we will track and test the new version of nottinygc in Higress, we will add CI tests for it, if you have any suggestions about this, I would be happy.

Thanks @johnlanni! I also realized that the manual coraza-proxy-wasm ftw test I always use should be in CI in this repo. Any other tests will also be welcome here, or there as well of course depending on the scope. I'll ping after adding the coraza one to see what makes sense structure-wise, thank you

@johnlanni I have added an e2e test based on the higress gc-test

https://github.com/wasilibs/nottinygc/tree/main/e2e/higress-gc-test
https://github.com/wasilibs/nottinygc/blob/main/magefiles/e2e.go#L115

Let me know if you have any suggestions for it. Thanks.

@anuraaga That's look great👍, maybe we should add more types of memory allocation in this plugin (with or without layout, etc.) which could cover the different if-else branches here:

nottinygc/gc.go

Lines 86 to 113 in a3593a7

    
           if layout&1 != 0 { 
        
           	// Layout is stored directly in the integer value. 
        
           	// Determine format of bitfields in the integer. 
        
           	const layoutBits = uint64(unsafe.Sizeof(layout) * 8) 
        
           	var sizeFieldBits uint64 
        
           	switch layoutBits { // note: this switch should be resolved at compile time 
        
           	case 16: 
        
           		sizeFieldBits = 4 
        
           	case 32: 
        
           		sizeFieldBits = 5 
        
           	case 64: 
        
           		sizeFieldBits = 6 
        
           	default: 
        
           		panic("unknown pointer size") 
        
           	} 
        
           	layoutSz := (layout >> 1) & (1<<sizeFieldBits - 1) 
        
           	layoutBm := layout >> (1 + sizeFieldBits) 
        
           	buf = allocSmall(size, layoutSz, layoutBm) 
        
           } else if layoutPtr == nil { 
        
           	// Unknown layout, assume all pointers. 
        
           	if size >= bigObjSize { 
        
           		buf = C.GC_malloc_ignore_off_page(C.uint(size)) 
        
           	} else { 
        
           		buf = C.GC_malloc(C.uint(size)) 
        
           	} 
        
           } else { 
        
           	buf = allocLarge(size, layoutPtr) 
        
           }

Hi,
We have a similar issue with wasm plugin built by tinygo. The envoy instance that holds the plugin consumes hundreds percent of CPU with increasing value of memory. It seems that nottinygc couldn't release the memory. We tried to implement the simplest test to reproduce this case here.
Does anyone have any idea?

Hi @behnamnikbakht - are you seeing perpetually increasing memory usage over time or is it that memory is not reduced when idle? Unfortunately Wasm has no means to return memory to the os, this is fundamental to the spec and nothing GC can do about it. So res memory will never go down even when idle

By the way, your test case appears to be the same as this e2e

https://github.com/wasilibs/nottinygc/blob/main/e2e/higress-gc-test/main.go#L64

AFAICT it is behaving as expected.

@anuraaga

Wasm has no means to return memory to the os, this is fundamental to the spec and nothing GC can do about it. So res memory will never go down even when idle

In our production environment, the WASM plugin processes 2000 requests per minute. The plugin heavily uses make for allocation. Our performance environment simulated 2000 RPM using Jmeter and ran the test for almost 2 days continuously. RES RES Memory growth of the envoy went from 150 MB to 900MB (see the below picture). If the RES memory will never go down, restarting the envoy every day is the only option. Any suggestions, please?

Thanks for the graph @cheranm - I realized that a lot of the focus in our e2e tests is on the GC allocation and not process memory. When running the higress e2e and looking at docker stats, I do see the same growing physical memory, despite our reported GC heap size being constant at around 20MB. I instrumented the binary with additional debugging, including logging mimalloc, and could confirm all the numbers double check to 20MB.

I don't know what to make of this - the numbers being reported by bdwgc and mimalloc could be wrong for some reason, but I don't think the chance of this is high. But in that case, it means the memory increase is outside what is allocated by this library, and I have no clues on how to debug that.

	if layout&1 != 0 {
	// Layout is stored directly in the integer value.
	// Determine format of bitfields in the integer.
	const layoutBits = uint64(unsafe.Sizeof(layout) * 8)
	var sizeFieldBits uint64
	switch layoutBits { // note: this switch should be resolved at compile time
	case 16:
	sizeFieldBits = 4
	case 32:
	sizeFieldBits = 5
	case 64:
	sizeFieldBits = 6
	default:
	panic("unknown pointer size")
	}
	layoutSz := (layout >> 1) & (1<<sizeFieldBits - 1)
	layoutBm := layout >> (1 + sizeFieldBits)
	buf = allocSmall(size, layoutSz, layoutBm)
	} else if layoutPtr == nil {
	// Unknown layout, assume all pointers.
	if size >= bigObjSize {
	buf = C.GC_malloc_ignore_off_page(C.uint(size))
	} else {
	buf = C.GC_malloc(C.uint(size))
	}
	} else {
	buf = allocLarge(size, layoutPtr)
	}