Without restarting the program, repeated loading of some rules keeps increasing memory

Question

Without restarting the program, repeated loading of some rules keeps increasing memory

vectian opened this issue 3 years ago · 12 comments

Go-Yara is used in my program. Now I want to reload a batch of rules according to the reload rule command I received without restarting the program. I use the command 'top' to monitor the proportion of memory, but after reloading, the memory has been increasing without being released.

Answer 1 · 2021-06-27T10:36:24.000Z

My best guess is that the Go garbage collector does not collect the Rules object (which would cause yr_rules_destroy to be called), but without a reproducer, it is hard to tell.

Is the size of the memory leak similar to the size of the compiled ruleset?

Answer 2 · 2021-06-27T10:41:22.000Z

Can you share some code sample that demonstrates the behavior?

Answer 3 · 2021-06-28T04:09:20.000Z

Can you share some code sample that demonstrates the behavior?

sample: https://github.com/qtmee/yara-memory-leak.git

Browser enter http://localhost:3000/i to simulate the recreate command

Answer 4 · 2021-06-28T13:32:36.000Z

Does adding runtime.GC() after replacing the Rules instance fix the problem for you?

How about explicitly calling s.Rules.Destroy() right before replacing it?

Answer 5 · 2021-06-28T14:01:08.000Z

Does adding runtime.GC() after replacing the Rules instance fix the problem for you?

How about explicitly calling s.Rules.Destroy() right before replacing it?

Both approaches have been tried, and memory has not dropped

Answer 6 · 2021-06-28T17:26:54.000Z

I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):

Arena 0:
system bytes     =     135168
in use bytes     =       4752
Arena 1:
system bytes     =     135168
in use bytes     =       2896
Arena 2:
system bytes     =     135168
in use bytes     =       2896
Arena 3:
system bytes     =     135168
in use bytes     =       2896
Arena 4:
system bytes     =     135168
in use bytes     =       2896
Total (incl. mmap):
system bytes     =     675840
in use bytes     =      16336
max mmap regions =          0
max mmap bytes   =          0

It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:

Arena 4:
system bytes     =   16281600
in use bytes     =   15946016

Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.

Arena 5:
system bytes     =   16281600
in use bytes     =      48112

Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)

Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.

Also note that 45216 bytes have apparently not been freed.

malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.

The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD on Linux:

: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main

Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).

Answer 7 · 2021-06-30T02:04:57.000Z

I have added a few endpoints to your sample in order to trigger GC and to log malloc statistics using the malloc_stats(3) function. Here's what I have found:
This is the initial state (before any rules have been parsed):
Arena 0:
system bytes     =     135168
in use bytes     =       4752
Arena 1:
system bytes     =     135168
in use bytes     =       2896
Arena 2:
system bytes     =     135168
in use bytes     =       2896
Arena 3:
system bytes     =     135168
in use bytes     =       2896
Arena 4:
system bytes     =     135168
in use bytes     =       2896
Total (incl. mmap):
system bytes     =     675840
in use bytes     =      16336
max mmap regions =          0
max mmap bytes   =          0
It turns out that having YARA parse a ruleset even just once will considerably grow Arena 4:
Arena 4:
system bytes     =   16281600
in use bytes     =   15946016
Forcing GC will cause the "in use bytes" value to drop, but the "system bytes" value will stay at its previous level.
Arena 5:
system bytes     =   16281600
in use bytes     =      48112
Note that for some reason, Arena 4 is now called Arena 5, but the "system bytes" value stays the same. (I don't know enough about the GNU libc heap implementation to explain this.)

Parsing the ruleset multiple times without forcing a GC will cause the rulesets to be allocated to multiple arenas.

Also note that 45216 bytes have apparently not been freed.

malloc(3) keeps buffers around for subsequent reuse, that's a reason why the "system bytes" values are not dropped. What looks like a HUGE memory leak may still be a leak in the code but not as large as one would think. Most of the leak has something to do with GNU libc malloc(3) implementation and probably fragmentation.

The Dgraph developers seem to have run into similar effects and their solution is to use jemalloc. One can override the default malloc(3) implementation with jemalloc using LD_PRELOAD on Linux:
: ; LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 ./main
Of course this is not a proper solution but it may help you in the short term. I'll try to figure out how to fix this with the standard malloc(3).

Thank you very much for your answer. I will continue to pay attention to this problem and try the solution you have provided at present

Answer 8 · 2021-07-11T19:16:25.000Z

I have taken a closer look at the various malloc tuning parameters documented in the GNU libc manual, but they don't seem to make any difference.

Rather than setting LD_PRELOAD, you can use a compiler build flag, though:

; go build -ldflags="-linkmode=external -extldflags=-ljemalloc"

Answer 9 · 2021-07-13T10:30:56.000Z

Do I modify the Go-Yara source code directly? Change malloc used in CGO to je_calloc? I have tried but failed. Could you help me to do it again? Thank you

Answer 10 · 2021-07-13T12:46:12.000Z

No, you don't have to modify anything. Just install libjemalloc-dev from your distribution and use the -ldflags parameter I gave above.

Answer 11 · 2022-01-26T13:57:29.000Z

This problem has been solved。There are no direct loading yara rules，the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。

Answer 12 · 2022-01-27T08:57:11.000Z

This problem has been solved。There are no direct loading yara rules，the yara rules are precompiled, and the yara rules that have been precompiled are loaded when reloaded。