Memory leak in Netflow::TemplateRegistry when cache_save_path parameter is not provided
123joshuawu opened this issue · 0 comments
Logstash information:
Version: 7.16.3
JVM (e.g. java -version
):
openjdk version "1.8.0_342"
OpenJDK Runtime Environment (build 1.8.0_342-8u342-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.342-b07, mixed mode)
OS version (uname -a
if on a Unix-like system):
Description of the problem including expected versus actual behavior:
TLDR: if cache_save_path
is not provided, Netflow::TemplateRegistry
does not call do_cleanup
which is in charge of cleaning up the Vash
memory caches.
In our testing, logstash heap memory usage would continually increase until it would crash with an out of memory exception. This would happen around every four hours in our environment.
Comparing heap dumps within those four hours, we noticed the memory usage of an object grow over 4x. (Right side is baseline, left is dump from oom crash)
Opening up the object, we can determine the class name from the metadata.
We trace this back to the corresponding source code:
logstash-codec-netflow/lib/logstash/codecs/netflow.rb
Lines 537 to 553 in b7df239
In the heap dump screenshot var2
and var5
correspond with the two instances of Vash
used in the TemplateRegistry
. From our testing, the memory usage of these two objects were continuously growing.
Looking at the Vash
implementation, we can see that it requires a manual cleanup
call in order to release memory.
https://gist.github.com/joshaven/184837
The Vash object will forget any answer that is requested after the specified
TTL. It is a good idea to manually clean things up from time to time because
it is possible that you'll cache data but never again access it and therefor
it will stay in memory after the TTL has expired. To clean up the Vash object,
call the method: cleanup!
In TemplateRegistry
, the cleanup call for both Vash
objects are made in the TemplateRegistry::do_cleanup
method.
logstash-codec-netflow/lib/logstash/codecs/netflow.rb
Lines 661 to 667 in b7df239
do_cleanup
is then only ever called in do_persist
logstash-codec-netflow/lib/logstash/codecs/netflow.rb
Lines 643 to 659 in b7df239
However, note that on line 644, if file_path
is not provided, then the do_persist
function exits early, hence skipping the call to do_cleanup
.
file_path
can then be traced back to the cache_save_path
setting in the initialization of the TemplateRegistry
.
logstash-codec-netflow/lib/logstash/codecs/netflow.rb
Lines 67 to 68 in b7df239
Thus we can see that this situation happens when a value is not provided for cache_save_path
, setting file_path
to nil
by default causing do_cleanup
to always get skipped.
Steps to reproduce:
Provide logs (if relevant):