BVLC/caffe

Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so

linhj184169280 opened this issue ยท 29 comments

The BLAS that I choosed in Makefile.conf is atlas, and I compile the caffe with pycaffe.

make test and make runtest is okay, but when I "import caffe" in python, it tells me "Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so". So what happened to my caffe?

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

When reporting a bug, it's most helpful to provide the following information, where applicable:

  • What steps reproduce the bug?
  • Can you reproduce the bug using the latest master, compiled with the DEBUG make option?
  • What hardware and operating system/distribution are you running?
  • If the bug is a crash, provide the backtrace (usually printed by Caffe; always obtainable with gdb).

Did you compile pycaffe with mkl? If you compiled with mkl in the past, you should make clean before recompiling.

Are you using Anaconda? The problem might not be related with caffe. Try

python -c 'import sklearn.linear_model.tests.test_randomized_l1'

If you can reproduce the error, that means the problem is not related with caffe but anaconda.
The latest version of numpy and scipy uses mkl by default. If you want to disable that, you can execute

conda install nomkl

that solves my problem. Hope that can solve yours, too.
More details:
scikit-learn/scikit-learn#5046
https://www.continuum.io/blog/developer-blog/anaconda-25-release-now-mkl-optimizations

jskDr commented

conda install nomkl numpy scipy scikit-learn numexpr
conda remove mkl mkl-service

The above answer in the site must not be a solution. If it happens still, the default mode of Anaconda should be nomkl as soon as possible at least in Ubuntu. What do you guys think?

Closing due to lack of reply from @linhj184169280, and to clean up the Issues page.

Hi,

Just wanted to note that Anaconda 4.0.0, shipped with mkl enabled by default, has this issue.
The problem is indeed with Anaconda, as it can be reproduced with the python sklearn test suggested above by @pcgreat.

The actual issue is that Anaconda linked with mkl, but not with libmkl_core.so, thus it has a missing symbol, and can be seen by running:

$ LD_DEBUG=symbols python -c 'import sklearn.linear_model.tests.test_randomized_l1' 2>&1 | grep -i error
      2200:     /opt/anaconda/lib/python2.7/site-packages/scipy/special/../../../../libmkl_avx.so: error: symbol lookup error: undefined symbol: mkl_dft_fft_fix_twiddle_table_32f (fatal)

I didn't want to uninstall mkl, as I'd like to have the performance boost, so I found a workaround which worked for me - preload libmkl_core.so before execution.

$ python -c 'import sklearn.linear_model.tests.test_randomized_l1'
Intel MKL FATAL ERROR: Cannot load libmkl_avx.so or libmkl_def.so.
$
$ LD_PRELOAD=/opt/anaconda/lib/libmkl_core.so python -c 'import sklearn.linear_model.tests.test_randomized_l1'
$

Regards,
Yanir.

@yanirj To use MKL properly it is required to set its environment using provided script. Usually something like that:
source /opt/intel/mkl/bin/mklvars.sh intel64

This is just example, MKL may be installed in different directory , and argument given match requested architecture (intel64 in that case). More options is available , but I gave You the most common one. Please try it and see if it works if you haven't used it. Let us know if this sorted out issue You are observing.

Regards,
Jacek

ibmua commented

Updating via
conda install mkl
solved it for me. It seems to have updated several modules including mkl, mkl-service and numpy.

Thanks @jskDr! Your solution helped me!

Hello: I found this thread while reseaching this MKL error, and summarized my answer here (related thread):

https://github.com/ContinuumIO/anaconda-issues/issues/720

TLDR:

conda install  -f  numpy

worked for me;

conda install mkl

did not. :-)

I have solved the problem,this is the tutorial https://docs.continuum.io/mkl-optimizations/, the command is:
1.conda update conda
2.conda update anaconda
3.conda update mkl

I had this issue with gensim. This worked:

$bash Anaconda-xxxxxx # script name for the fresh install
$ pip install --upgrade gensim
$ conda install mkl

Strangely, swapping the last two steps does not work.

$bash Anaconda-xxxxxx # script name for the fresh install
$ conda install mkl
$ pip install --upgrade gensim

Following instructions from @victoriastuart and @ujsyehao , I updated mkl and anaconda. It removed the original error. But there was a new error:

Intel MKL FATAL ERROR: Error on loading function mkl_lapack_ps_mc3_dgetrf_small.

So I removed mkl and installed nomkl following @pcgreat and @jskDr. It works.
Thank you all.

I had the same problem, and it went away after updating Anaconda to the latest version (4.3.0 with Python 3.6).

Just a heads up for anyone else that may end up here that this error can also be a red herring at times. I got a similar error recently due to inadvertently running a script while inside a mounted directory since behind the scenes its checking cwd and can't make sense of where things are.

Thanks @jskDr! I solved it using your commands, but I didn't need to remove mkl. My problem was importing scikit-image in a conda environment

same here. there was no need to remove mkl. thanks @jskDr for helping me out

wgong commented

conda install nomkl

worked for me

either with conda update mkl or with conda install nomkl, I am receiving this message which I am not sure what to do?
for the first code

anaconda: 4.4.0-np112py36_0 --> custom-py36_0

What are the downsides of this action if I press "yes"?
I mean should I perform update task differently later on or should I run python codes differently thereafter?

I had this same issue using scikit-learn 0.19 and numpy 1.13.3 when running MLPRegressor (and also with a package called pyearth running an algorithm called MARS). I believe the root of the problem was that our python is part of an Anaconda install, but scikit-learn and numpy were installed via pip, and their expectations for mkl must not agree.

Unfortunately my framework is managed by some dedicated company admins, not by me, so I haven't gotten my guy to try recompiling numpy yet. But I was able to find a workaround based on this thread: Adding export LD_PRELOAD=/path/to/anaconda/lib/libmkl_def.so:/path/to/anaconda/lib/libmkl_avx.so:/path/to/anaconda/lib/libmkl_core.so:/path/to/anaconda/lib/libmkl_intel_lp64.so:/path/to/anaconda/lib/libmkl_intel_thread.so:/path/to/anaconda/lib/libiomp5.so to my ~/.bashrc causes the problem to disappear. It's super hacky, and I'd be lying if I said I knew exactly what it's doing (but this is helpful), so I'm hoping a recompile of numpy is a cleaner fix. But at least it works.

Just some more info: Installing "nomkl" is not a solution! It simply disables mkl falling back to very slow functions.
Trying LD_PRELOAD=~/miniconda3/lib/libmkl_avx2.so resulted in libmkl_avx2.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8
Similar for LD_PRELOAD=~/miniconda3/lib/libmkl_sequential.so: libmkl_sequential.so: undefined symbol: mkl_spblas_ccsr0nd_uc__mmout_seq

Hence we need to find the libraries that export those symbols: find ~/miniconda3/lib/ -name "libmkl*" -exec nm --print-file -D {} \; | grep mkl_sparse_optimize_bsr_trsm_i8
That got me: libmkl_intel_thread.so, libmkl_sequential.so, libmkl_tbb_thread.so, libmkl_pgi_thread.so, libmkl_gnu_thread.so

So preloading libmkl_sequential.so should solve that. But the other symbol is remaining. Same here: find ~/miniconda3/lib/ -name "libmkl*" -exec nm --print-file -D {} \; | grep mkl_spblas_ccsr0nd_uc__mmout_seq which gave me libmkl_core.so

TLDR:
So working: LD_PRELOAD=~/miniconda3/lib/libmkl_core.so:~/miniconda3/lib/libmkl_sequential.so which is exactly what is written in http://debugjournal.tumblr.com/post/98401758462/intel-mkl-dynamic-link-library-error (Original: https://stackoverflow.com/a/21079900/1930508 from https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/300857#comment-1627042)

Note: libmkl_sequential might not be the best choice for performance. So you could try one of the thread libraries instead.

@pcgreat, it worked for me. Thanks.

Note: conda install nomkl means: "Remove mkl and replace by a (slow) standard version"

The following worked for me
conda install -f numpy.

still not fixed!! :-(

The workaround proposed by @Flamefire worked best for me!

I had a similar issue using Faiss - this worked. Solution sourced from https://www.programmersought.com/article/10826550193/ and https://blog.csdn.net/qikaihuting/article/details/103526376 and :
Add the following line to your ~/.bashrc file:

In my case (on WSL2 Ubuntu on Windows) the Intel MKL libraries were installed at /home/sashi/anaconda3/lib/ just update the following line pointing to the appropriate folder on your machine.

export LD_PRELOAD=/home/sashi/anaconda3/lib/libmkl_def.so:/home/sashi/anaconda3/lib/libmkl_avx.so:/home/sashi/anaconda3/lib/libmkl_core.so:/home/sashi/anaconda3/lib/libmkl_intel_lp64.so:/home/sashi/anaconda3/lib/libmkl_intel_thread.so:/home/sashi/anaconda3/lib/libiomp5.so

For me downgrading mkl solved it:
conda install mkl=2021.2.0
(Ubuntu 21.04, python 3.8)