Can't build or run marian after a libcublas10 update
rihardsk opened this issue · 1 comments
Bug description
After updating to libcublas10
version 10.2.3.254-1
, Marian can no longer locate libcublas.so.10
on it's own:
$ ldd marian-decoder
linux-vdso.so.1 (0x00007fffab1f4000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f73a6071000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f73a5e6d000)
libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007f73a1e0c000)
libcusparse.so.10 => /usr/local/cuda/lib64/libcusparse.so.10 (0x00007f739ab85000)
libcublas.so.10 => not found
libtcmalloc_minimal.so.4 => /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 (0x00007f739a93a000)
libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f739a46f000)
libboost_system.so.1.65.1 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1 (0x00007f739a26a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f739a04b000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7399c3e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f73998a0000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7399688000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7399297000)
/lib64/ld-linux-x86-64.so.2 (0x00007f73ab9bd000)
On a system with libcublas10
version 10.2.2.89-1
, everything's fine
$ ldd marian-decoder
linux-vdso.so.1 (0x00007ffe433d0000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f97d97b7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f97d95b3000)
libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007f97d5552000)
libcusparse.so.10 => /usr/local/cuda/lib64/libcusparse.so.10 (0x00007f97ce2cb000)
libcublas.so.10 => /usr/lib/x86_64-linux-gnu/libcublas.so.10 (0x00007f97ca015000)
libtcmalloc_minimal.so.4 => /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4 (0x00007f97c9dca000)
libcrypto.so.1.1 => /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f97c98ff000)
libboost_system.so.1.65.1 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1 (0x00007f97c96fa000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f97c94db000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f97c9152000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f97c8db4000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f97c8b9c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f97c87ab000)
/lib64/ld-linux-x86-64.so.2 (0x00007f97df103000)
libcublasLt.so.10 => /usr/lib/x86_64-linux-gnu/libcublasLt.so.10 (0x00007f97c6918000)
This appears to be caused by changes in the libcublas10
package
10.2.3.254-1:
dpkg -L libcublas10
/.
/usr
/usr/local
/usr/local/cuda-10.2
/usr/local/cuda-10.2/targets
/usr/local/cuda-10.2/targets/x86_64-linux
/usr/local/cuda-10.2/targets/x86_64-linux/lib
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10.2.3.254
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10.2.3.254
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvblas.so.10.2.3.254
/usr/share
/usr/share/doc
/usr/share/doc/libcublas10
/usr/share/doc/libcublas10/changelog.Debian.gz
/usr/share/doc/libcublas10/copyright
/usr/local/cuda-10.2/lib64
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvblas.so.10
10.2.2.89-1:
dpkg -L libcublas10
/.
/usr
/usr/share
/usr/share/doc
/usr/share/doc/libcublas10
/usr/share/doc/libcublas10/changelog.Debian.gz
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libcublasLt.so.10.2.2.89
/usr/lib/x86_64-linux-gnu/libnvblas.so.10.2.2.89
/usr/lib/x86_64-linux-gnu/libcublas.so.10.2.2.89
/usr/lib/x86_64-linux-gnu/libnvblas.so.10
/usr/lib/x86_64-linux-gnu/libcublas.so.10
/usr/lib/x86_64-linux-gnu/libcublasLt.so.10
The /usr/lib/x86_64-linux-gnu is removed in the newer version and none of the libcublas.so files are located in any of the default shared library search paths.
This concerns building as well. Running cmake
on the latest master on the system with libcublas10
version 10.2.3.254-1
gives
$ cmake ..
-- The CXX compiler identification is GNU 7.5.0
-- The C compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Project name: marian
-- Project version: v1.10.25;+ab6b8260
Submodule 'examples' (https://github.com/marian-nmt/marian-examples) registered for path 'examples'
Submodule 'regression-tests' (https://github.com/marian-nmt/marian-regression-tests) registered for path 'regression-tests'
Submodule 'src/3rd_party/fbgemm' (https://github.com/marian-nmt/FBGEMM) registered for path 'src/3rd_party/fbgemm'
Submodule 'src/3rd_party/intgemm' (https://github.com/marian-nmt/intgemm/) registered for path 'src/3rd_party/intgemm'
Submodule 'src/3rd_party/nccl' (https://github.com/marian-nmt/nccl) registered for path 'src/3rd_party/nccl'
Submodule 'src/3rd_party/sentencepiece' (https://github.com/marian-nmt/sentencepiece) registered for path 'src/3rd_party/sentencepiece'
Submodule 'src/3rd_party/simple-websocket-server' (https://github.com/marian-nmt/Simple-WebSocket-Server) registered for path 'src/3rd_party/simple-websocket-server'
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/examples'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/regression-tests'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/fbgemm'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/intgemm'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/nccl'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/sentencepiece'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/simple-websocket-server'...
Submodule path 'examples': checked out '6d5921cc7de91f4e915b59e9c52c9a76c4e99b00'
Submodule path 'regression-tests': checked out '32a2f7960d8cc48d6c90cbb5d03fbb42eb923d3d'
Submodule path 'src/3rd_party/fbgemm': checked out '6f45243cb8ab7d7ab921af18d313ae97144618b8'
Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'src/3rd_party/fbgemm/third_party/asmjit'
Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'src/3rd_party/fbgemm/third_party/cpuinfo'
Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'src/3rd_party/fbgemm/third_party/googletest'
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/fbgemm/third_party/asmjit'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/fbgemm/third_party/cpuinfo'...
Cloning into '/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src/3rd_party/fbgemm/third_party/googletest'...
Submodule path 'src/3rd_party/fbgemm/third_party/asmjit': checked out '4da474ac9aa2689e88d5e40a2f37628f302d7e3c'
Submodule path 'src/3rd_party/fbgemm/third_party/cpuinfo': checked out 'd5e37adf1406cf899d7d9ec1d317c47506ccb970'
Submodule path 'src/3rd_party/fbgemm/third_party/googletest': checked out '0fc5466dbb9e623029b1ada539717d10bd45e99e'
Submodule path 'src/3rd_party/intgemm': checked out '8abde25b13c3ab210c0dec8e23f4944e3953812d'
Submodule path 'src/3rd_party/nccl': checked out '5dcf7751494f9d04057bfc6b4a2b64611bc12253'
Submodule path 'src/3rd_party/sentencepiece': checked out 'c307b874deb5ea896db8f93506e173353e66d4d3'
Submodule path 'src/3rd_party/simple-websocket-server': checked out '1d7e84aeb3f1ebdc78f6965d79ad3ca3003789fe'
CMake Warning at CMakeLists.txt:74 (message):
CMAKE_BUILD_TYPE not set; setting to Release
-- Building with -march=native and intrinsics will be chosen automatically by the compiler to match the current machine.
-- Checking support for CPU intrinsics
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found suitable version "10.1", minimum required is "9.0")
-- Compiling code for Pascal GPUs
-- Compiling code for Volta GPUs
-- Compiling code for Turing GPUs
-- Found CUDA libraries: /usr/local/cuda/lib64/libcurand.so;/usr/local/cuda/lib64/libcusparse.so;CUDA_cublas_LIBRARY-NOTFOUND
-- Found Tcmalloc: /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so
-- Found MKL: -Wl,--start-group;/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a;/opt/intel/mkl/lib/intel64/libmkl_sequential.a;/opt/intel/mkl/lib/intel64/libmkl_core.a;-Wl,--end-group
CMake Warning at src/3rd_party/intgemm/CMakeLists.txt:33 (message):
Not building AVX512VNNI-based multiplication because your compiler is
too old.
For details rerun cmake with --debug-trycompile then try to build in
compile_tests/CMakeFiles/CMakeTmp.
-- VERSION: 0.1.94
-- Found TCMalloc: /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.13") found components: doxygen dot
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_LIBRARY (ADVANCED)
linked by target "marian" in directory /home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/src
-- Configuring incomplete, errors occurred!
See also "/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/build/CMakeFiles/CMakeOutput.log".
See also "/home/TILDE.LV/rihards.krislauks/prog/cpp/marian-ld-test/build/CMakeFiles/CMakeError.log".
I know I can work around this by setting LD_LIBRARY_PATH
but I'm curious what's a proper solution supposed to be here. It's weird that libcublas10
is now packaged in way that avoids using any of the default shared library search paths.
How to reproduce
Update libcublas10
to version 10.2.3.254-1
and try to run marian-decoder
or build the project.
Setting LD_LIBRARY_PATH
actually doesn't help when running cmake (which makes sense). Currently, I'm unable to build Marian with the updated libcublas10
library. Is there a way to show cmake where to look for cublas?