jonmagic/yara-ffi

Fix memory leak

Closed this issue · 10 comments

kmcq commented

Job to be done

We are seeing a memory leak in the gem at version 2.1.0. Let's remove the leak!

kmcq commented

We are calling yr_finalize in an ensure block. This method "must be called by the main free to release any resource allocated by the library" so I would think it would free any yara memory. This could imply that we're adding memory using FFI and not freeing it.

kmcq commented

@jonmagic added a failing test in fix-memory-leak:

def test_there_is_no_memory_leak
  Yara.test(rule, "i think we were here that one time")
  baseline = ObjectSpace::memsize_of_all
  memory_sizes = []
  100.times do
    Yara.test(rule, "i think we were here that one time")
    memory_sizes << ObjectSpace::memsize_of_all
  end
  on_average_grew_by = memory_sizes[1..-1].map.with_index { |size, i| size - memory_sizes[i - 1] }.reduce(:+) / memory_sizes.size - 1
  assert memory_sizes.all? { |size| size < baseline + 10_000 }, "Memory leak detected, baseline was #{baseline} bytes and it grew by #{on_average_grew_by} bytes on average per Yara.test execution."
end
kmcq commented

We create three ::FFI::MemoryPointers:

  1. compiler_pointer
  2. rules_pointer
  3. test_string_pointer

Calling MemoryPointer#free in the ensure block for either the compiler or rules pointers causes a big error. Doing this for test_string_pointer does not fix the leak according to the test and in fact removing this pointer completely does not fix the leak.

kmcq commented

According to our test setup, any of uses of FFI causes a leak. For example:

def self.test(rule_string, test_string)
  scanning = true
  results = []

  results
end

has no leak, but

def self.test(rule_string, test_string)
  user_data = UserData.new
  scanning = true
  results = []

  results
end

does

kmcq commented

I found this gist which has a different way of getting the process's memory and seems to give us better results:

def mem
  `ps -o rss -p #{Process.pid}`[/\d+/].to_i
end

Now as I remove the compiler and rules pointers the memory leak gets much better -- especially when I remove the compiler pointer. Also doing the example above where all we did was initialize UserData no longer has a leak.

I'll look into ensuring that the yara pointers get cleaned up.

kmcq commented

We're leaking about 8,200 bytes per run:

  • Yara::FFI.yr_compiler_create(compiler_pointer) causes a leak of about 200 bytes
  • Yara::FFI.yr_compiler_set_callback(compiler_pointer, error_callback, user_data) maybe causes a few bytes to leak
  • Yara::FFI.yr_compiler_add_string(compiler_pointer, rule_string, nil) causes a leak of about 5,000 bytes.
  • Yara::FFI.yr_compiler_get_rules(compiler_pointer, rules_pointer) causes a leak of about 3,000 bytes.

Removing those calls removes all leaking.

kmcq commented

Adding calls to Yara::FFI.yr_rules_destroy(rules_pointer) and Yara::FFI.yr_compiler_destroy(compiler_pointer) closes the leak to around 500 bytes per run 🚀

kmcq commented

The remaining leak breakdown:

  • Yara::FFI.yr_compiler_add_string(compiler_pointer, rule_string, nil) leaks about 250 bytes
  • Yara::FFI.yr_compiler_get_rules(compiler_pointer, rules_pointer) leaks about 250 bytes
kmcq commented

We think the remaining leak could be just regular Ruby garbage collection / malloc issues and not anything to do with the gem.

kmcq commented

Closed by #11