deeplearning4j/deeplearning4j

Importing a h5 model causes core dumps

nas-sh opened this issue · 5 comments

nas-sh commented

Issue Description

Using the following to import the h5 model leads to an error in the native code:
KerasModelImport.importKerasModelAndWeights(h5ModelPath, false)

The model corresponding to the h5 file is created using keras, and can be found here.

  • expected behavior: a model loaded into a ComputationGraph object
  • encountered behavior: getting an error in the native call:

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x000000010dee7e31, pid=17133, tid=9731

JRE version: OpenJDK Runtime Environment (18.0.2.1+1) (build 18.0.2.1+1-1)
Java VM: OpenJDK 64-Bit Server VM (18.0.2.1+1-1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
Problematic frame:
V [libjvm.dylib+0x57ce31] jni_SetLongField+0xc1

No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

An error report file with more information is saved as:
/hs_err_pid17133.log

If you would like to submit a bug report, please visit:
https://bugreport.java.com/bugreport/crash.jsp

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)


Version Information

Please indicate relevant versions, including, if relevant:

  • Deeplearning4j version: 1.0.0-M2.1
  • Platform information (OS, etc): "MacBookPro16,1" x86_64 2400 MHz, 16 cores, 64G, Darwin 21.6.0, macOS 12.6.6
  • CUDA version, if used: N/A
  • NVIDIA driver version, if in use: N/A

Additional Information

Where applicable, please also provide:

  • Full log or exception stack trace (ideally in a Gist: gist.github.com)

You can find the full log here: https://gist.github.com/nas-sh/4d40d33a3599de913ca7e7ba3646ee78#file-hs_err_pid17133-log

  • pom.xml file or similar (also in a Gist)

Contributing

If you'd like to help us fix the issue by contributing some code, but would
like guidance or help in doing so, please mention it!

@nas-sh can you give me an overview of what keras version this is? Is it newer? Older? Native crashes are definitely coming from the hdf5 package somewhere not the library. Again I don't know if this is a first party problem. At most this might be a version issue. I'd have to know what version of keras h5 uses to see what the difference there might be. Due to this being a 3rd party dependency I'm not sure how much we'll be able to support this and I won't be forking hdf5 to fix it.

nas-sh commented

We used Keras 2.11.0 on windows to generate and train the model. Here is our environment.yml file which includes h5py=3.8.0 and hdf5=1.12.2.

@nas-sh your issue might be platform specific. I was able to import the model with:

        ComputationGraph model = KerasModelImport.importKerasModelAndWeights("smoke_segmentation.h5");

I'm closing this. If it fails on linux, please wait for the next release which is currently in progress (just cuda testing left)

nas-sh commented

@agibsonccc on what platform were you able to import the model using the statement above?

I tried it and got the following error message:

Exception in thread "main" org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: Optimizer with name Custom>Adamcan not bematched to a DL4J optimizer. Note that custom TFOptimizers are not supported by model import. Please file an issue at https://github.com/eclipse/deeplearning4j/issues.
at org.deeplearning4j.nn.modelimport.keras.utils.KerasOptimizerUtils.mapOptimizer(KerasOptimizerUtils.java:151)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.importTrainingConfiguration(KerasModel.java:395)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:172)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:97)
at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311)
at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:167)
at gov.nasa.race.ml.h5import.SmokeSegmentationModel.main(SmokeSegmentationModel.java:29)