leovandriel/caffe2_cpp_tutorial

Segmentation fault in malloc_consolidate() when using Intel MKL

Closed this issue · 5 comments

sryap commented

Hi,

I encountered segmentation fault in malloc_consolidate() when running the MNIST training using Intel MKL. I tried it with OpenBLAS and it worked fine. However, I would like to use Intel MKL.

Segmentation fault happened after the training was done. It happened when the program was exiting.

Here is the backtrace from GDB.

Program received signal SIGSEGV, Segmentation fault.
0x00002aaab3be55d3 in malloc_consolidate () from /usr/lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7_4.2.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7_4.1.x86_64 libselinux-2.5-11.el7.x86_64 libstdc++-4.8.5-16.el7_4.1.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00002aaab3be55d3 in malloc_consolidate () from /usr/lib64/libc.so.6
#1  0x00002aaab3be64fe in _int_free () from /usr/lib64/libc.so.6
#2  0x00002aaaba1cc22d in mkl_serv_free_buffers ()
   from /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin/libmkl_core.so
#3  0x00002aaaba1dcb26 in mkl_core_fini ()
   from /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin/libmkl_core.so
#4  0x00002aaaaaabab3a in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#5  0x00002aaab3ba2a69 in __run_exit_handlers () from /usr/lib64/libc.so.6
#6  0x00002aaab3ba2ab5 in exit () from /usr/lib64/libc.so.6
#7  0x00002aaab3b8bc0c in __libc_start_main () from /usr/lib64/libc.so.6
#8  0x000000000060e5a3 in _start ()

Please let me know if you have any thoughts on how to solve this problem.

Thanks!

Hi @sarunyap. I haven't run the tutorials or compiled Caffe2 using MKL. Have you tried running this with additional debug flags? Also, did you try run on of Caffe2's own tutorials or demo project?

sryap commented

Hi @leonardvandriel. I haven't tried running it with debug flags. I didn't run any of Caffe2's tutorials. I'm not aware of other Caffe2's C++ tutorials. Could you please point me to the Caffe2's C++ tutorials that you know of? Thanks!

About the debug flags: there's a debug target in the Makefile that rebuilds with debug symbols. The error message also lists additional packages to be installed.

I don't know of any C++ tutorials, unfortunately. Sorry about that. I was referring to the Python tutorials on the Caffe2 website. I just wanted to see if the error is related to this repository, or if it's a general Caffe2 issue.

sryap commented

Thanks! I will try the debug flags and also try the Python tutorials. I'll let you know if I find the cause of this error.

sryap commented

I found that it's a linking problem. My program links to multiple MKL libraries. Now I rebuild the program and link it to only one MKL library. The problem is solved. Thanks for your help!