ninia/jep

Java Runtime Environment crashed after using SharedInterpreter.invoke() twice

Loc300 opened this issue · 3 comments

The problem:
I need to make predictions with pytorch in java and so im trying to use JEP to communicate between java and python.
On the first iteration i get an output as expected, but after the first run, i get this large error message

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007ff815e1f3e4, pid=3753, tid=0x0000000000002803

JRE version: OpenJDK Runtime Environment (8.0_302-b08) (build 1.8.0_302-b08)
Java VM: OpenJDK 64-Bit Server VM (25.302-b08 mixed mode bsd-amd64 compressed oops)
Problematic frame:
C [libsystem_pthread.dylib+0x13e4] pthread_mutex_lock+0x4
somtimes: C [Python+0x12d713] take_gil+0x30

Im using the
MainInterpreter.setJepLibraryPath()
to set the libjep.jnilib path on mac.

And
interp.eval("import sys"); & interp.eval("sys.path.append('')
to define the path in which other .py functionalities have been defined

interp.runScript() to get to the python file path
and interp.invoke() to execute the method with the required parameters

Could someone help me with this?

Im Using an M1 Macbook Pro

  • Sonoma 14.4 (23E214)
  • Python 3.11
  • JEP==4.2.0
  • Python packages are mostly pytorch, numpy, logging and json

Are you using SharedInterpreter or SubInterpreter. Numpy is not compatible with SubInterpreter and will often fail when it is used in a second SubInterpreter. If that is the problem then using SharedINterpreter should fix it.

Are you using SharedInterpreter or SubInterpreter. Numpy is not compatible with SubInterpreter and will often fail when it is used in a second SubInterpreter. If that is the problem then using SharedINterpreter should fix it.

Im already using SharedInterpreter interp = new SharedInterpreter();

I'm not sure Jep can do anything about a SIGSEGV in libsystem_pthread.dylib. Was there a stacktrace that showed what was calling take_gil or pthread_mutex_lock? If you could create a small, reproducible test case that shows the issue, that would help.