ninia/jep

Memory Retention After Executing Python Function in SharedInterpreter

Closed this issue · 7 comments

Describe the bug
When using the Jep library, it appears that defining and executing a Python function within a SharedInterpreter instance causes memory to be retained, even after closing the interpreter.

To Reproduce

  1. Create a new SharedInterpreter instance and execute a Python function definition and invocation using the following code:
jepInterpreter = new SharedInterpreter();
jepInterpreter.exec("def foo():\n\tpass\nfoo()");
  1. Execute a subsequent memory-intensive operation:
jepInterpreter.exec("x=[10000]*100000000");

Observe a memory spike in the JVM.

  1. Close the interpreter:
jepInterpreter.close();

Observe that the memory used by the interpreter is not released.

Expected behavior
The memory used by the interpreter should be released after the interpreter is closed.

Environment :

  • OS Platform, Distribution, and Version: Fedora Linux 40 : Linux 6.8.11-300.fc40.x86_64
  • Python Distribution and Version: 3.9
  • Java Distribution and Version: java-17-amazon-corretto
  • Jep Version: 4.0.3

Additional context
Observed that this issue also happens on jep==4.2.0
If the line defining and invoking the function jepInterpreter.exec("def foo():\n\tpass\nfoo()"); is removed, the memory is released as expected after closing the interpreter.

Thank you for providing clear instructions to easily recreate the problem.

The problem is that jep relies on Python reference counting to clear memory during close(). Python reference counting has a limitation that it cannot clear circular references. In this case the foo() function maintains a reference to the globals in foo.__globals__ and the globals has a reference to foo which introduces a cycle that reference counting alone cannot free and since it involves globals everything in globals is kept in memory.

The good news is that the memory is not lost forever, if you have other SharedInterpreters running or if you open new SharedInterpreters then the Python garbage collection(gc) will eventually run and free the memory. I was able to free the memory by opening a new SharedInterpreter and running gc manually like this:

try (SharedInterpreter interpreter = new SharedInterpreter()) {
    interpreter.exec("def foo():\n\tpass\nfoo()");
    interpreter.exec("x=[10000]*100000000");
}
Thread.sleep(30000); // observe the memory
try (SharedInterpreter interpreter = new SharedInterpreter()) {
    interpreter.exec("import gc");
    interpreter.exec("gc.collect()");
}

Another workaround for existing versions of jep is to run gc manually before close, this worked for me also:

try (SharedInterpreter interpreter = new SharedInterpreter()) {
    interpreter.exec("def foo():\n\tpass\nfoo()");
    interpreter.exec("x=[10000]*100000000");
    interpreter.exec("globals().clear()\nimport gc\ngc.collect()");
}

I agree this is something that Jep should handle better so I have opened #556 to have jep run gc on close(). That will resolve the problem in future versions of Jep. If you have the ability to build Jep with the change in the PR and test it then I would appreciate confirmation that this fixes the problem in any more complex use cases you might have.

Thanks a lot for the quick turnaround! I had to replace interpreter.exec("globals().clear()\nimport gc\ngc.collect()"); with interpreter.exec("globals().clear()"); . I was getting gettattr not defined error, probably because of clearing out globals. I unfortunately could not test out your fix in #556. Does interpreter.exec("globals().clear()\nimport gc\ngc.collect()"); work when there is a single SharedInterpreters?

Thanks a lot for the quick turnaround! I had to replace interpreter.exec("globals().clear()\nimport gc\ngc.collect()"); with interpreter.exec("globals().clear()"); .

In this case that would be enough to break the reference cycle so reference counting can clear the memory without gc.

I was getting gettattr not defined error, probably because of clearing out globals. I unfortunately could not test out your fix in #556.

I have not seen this error in my tests. I tried it on both python3.10 and python3.9. I am just using the 5 lines from my last comment to test. If you have any idea how to recreate this error it would be helpful because I would like to make sure the fix in the PR is not causing the same error.

Does interpreter.exec("globals().clear()\nimport gc\ngc.collect()"); work when there is a single SharedInterpreters?

It should work regardless of whether there is one or multiple interpreters. I have only tested it with a single interpreter. the collect() might take slightly longer with more SharedInterpreters because there are more objects to traverse but I wouldn't expect it to be noticeable in most cases.

Thanks a lot for the quick turnaround! I had to replace interpreter.exec("globals().clear()\nimport gc\ngc.collect()"); with interpreter.exec("globals().clear()"); .

In this case that would be enough to break the reference cycle so reference counting can clear the memory without gc.

I was getting gettattr not defined error, probably because of clearing out globals. I unfortunately could not test out your fix in #556.

I have not seen this error in my tests. I tried it on both python3.10 and python3.9. I am just using the 5 lines from my last comment to test. If you have any idea how to recreate this error it would be helpful because I would like to make sure the fix in the PR is not causing the same error.

Does interpreter.exec("globals().clear()\nimport gc\ngc.collect()"); work when there is a single SharedInterpreters?

It should work regardless of whether there is one or multiple interpreters. I have only tested it with a single interpreter. the collect() might take slightly longer with more SharedInterpreters because there are more objects to traverse but I wouldn't expect it to be noticeable in most cases.

This reason for this failure was code-specific. We were doing something that basically was calling a gettattr which would have been subsequently deleted due globals().clear()

Hey @bsteffensmeier . I found another memory leak. If you have a class method that is cached ( in my case functools.lru_cache) it leaks memory. This probably extends to the fact that static class variables and methods are not gc'ed. I will create a minimal reproduction and post here, was curious if you have seen something like this before.

Hey @bsteffensmeier . I found another memory leak. If you have a class method that is cached ( in my case functools.lru_cache) it leaks memory. This probably extends to the fact that static class variables and methods are not gc'ed. I will create a minimal reproduction and post here, was curious if you have seen something like this before.

I have not seen something like this before, were you able to create a minimal reproduction of this? With shared interpreters it is expected that some shared state will not be gc'ed so this might not be a bug, if there are significant leaks in jep we may need to look at adding some methods to clear out some shared state but if you are using something like lru_cache which deliberately increases memory usage then you might need to have your own cleanup to remove it.

It has been some while so I don't exactly remember what the resolution was but I am pretty sure that it was nothing on the jep side. Thanks for the fix for the original issue! I'll close this out 💯