Importing a h5 model causes core dumps
nas-sh opened this issue · 5 comments
Issue Description
Using the following to import the h5 model leads to an error in the native code:
KerasModelImport.importKerasModelAndWeights(h5ModelPath, false)
The model corresponding to the h5 file is created using keras, and can be found here.
- expected behavior: a model loaded into a ComputationGraph object
- encountered behavior: getting an error in the native call:
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x000000010dee7e31, pid=17133, tid=9731
JRE version: OpenJDK Runtime Environment (18.0.2.1+1) (build 18.0.2.1+1-1)
Java VM: OpenJDK 64-Bit Server VM (18.0.2.1+1-1, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
Problematic frame:
V [libjvm.dylib+0x57ce31] jni_SetLongField+0xc1
No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
An error report file with more information is saved as:
/hs_err_pid17133.log
If you would like to submit a bug report, please visit:
https://bugreport.java.com/bugreport/crash.jsp
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
Version Information
Please indicate relevant versions, including, if relevant:
- Deeplearning4j version: 1.0.0-M2.1
- Platform information (OS, etc): "MacBookPro16,1" x86_64 2400 MHz, 16 cores, 64G, Darwin 21.6.0, macOS 12.6.6
- CUDA version, if used: N/A
- NVIDIA driver version, if in use: N/A
Additional Information
Where applicable, please also provide:
- Full log or exception stack trace (ideally in a Gist: gist.github.com)
You can find the full log here: https://gist.github.com/nas-sh/4d40d33a3599de913ca7e7ba3646ee78#file-hs_err_pid17133-log
- pom.xml file or similar (also in a Gist)
Contributing
If you'd like to help us fix the issue by contributing some code, but would
like guidance or help in doing so, please mention it!
@nas-sh can you give me an overview of what keras version this is? Is it newer? Older? Native crashes are definitely coming from the hdf5 package somewhere not the library. Again I don't know if this is a first party problem. At most this might be a version issue. I'd have to know what version of keras h5 uses to see what the difference there might be. Due to this being a 3rd party dependency I'm not sure how much we'll be able to support this and I won't be forking hdf5 to fix it.
We used Keras 2.11.0 on windows to generate and train the model. Here is our environment.yml file which includes h5py=3.8.0 and hdf5=1.12.2.
@nas-sh your issue might be platform specific. I was able to import the model with:
ComputationGraph model = KerasModelImport.importKerasModelAndWeights("smoke_segmentation.h5");
I'm closing this. If it fails on linux, please wait for the next release which is currently in progress (just cuda testing left)
@agibsonccc on what platform were you able to import the model using the statement above?
I tried it and got the following error message:
Exception in thread "main" org.deeplearning4j.nn.modelimport.keras.exceptions.UnsupportedKerasConfigurationException: Optimizer with name Custom>Adamcan not bematched to a DL4J optimizer. Note that custom TFOptimizers are not supported by model import. Please file an issue at https://github.com/eclipse/deeplearning4j/issues.
at org.deeplearning4j.nn.modelimport.keras.utils.KerasOptimizerUtils.mapOptimizer(KerasOptimizerUtils.java:151)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.importTrainingConfiguration(KerasModel.java:395)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:172)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.(KerasModel.java:97)
at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311)
at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:167)
at gov.nasa.race.ml.h5import.SmokeSegmentationModel.main(SmokeSegmentationModel.java:29)