ralna/spral

-lcudart_static and -lcublas not found using meson build system

Opened this issue · 10 comments

Hello,

I've been trying to compile spral for a while now to use it later with Ipopt. First , when compiling with autotools and running make check, the test corresponding to ssids_test fails with a segmentation fault. Now I'm trying to compile it using the meson build system also without luck. I will appreciate if you could help me with that.

These are the outputs corresponding to the commands in the README file:

meson setup builddir -Dexamples=true -Dtests=true -Dlibblas=openblas -Dliblapack=openblas -Dlibmetis=coinmetis

The Meson build system
Version: 1.4.0
Source dir: /home/ar612/Installers/SPRAL/spral
Build dir: /home/ar612/Installers/SPRAL/spral/builddir
Build type: native build
Project name: SPRAL
Project version: 2024.05.08
Fortran compiler for the host machine: gfortran (gcc 11.4.0 "GNU Fortran (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
Fortran linker for the host machine: gfortran ld.bfd 2.38
C compiler for the host machine: cc (gcc 11.4.0 "cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
C linker for the host machine: cc ld.bfd 2.38
C++ compiler for the host machine: c++ (gcc 11.4.0 "c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
C++ linker for the host machine: c++ ld.bfd 2.38
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Library openblas found: YES
Library openblas found: YES
Library coinmetis found: YES
Library hwloc found: YES
Run-time dependency CUDA (modules: cudart_static, rt, pthread, dl, cublas) found: YES 12.5 (/usr/local/cuda-12.5)
Library m found: YES
Has header "cblas.h" : YES 
Has header "hwloc.h" : YES 
Build targets in project: 45

SPRAL 2024.05.08

  User defined options
    examples : true
    libblas  : openblas
    liblapack: openblas
    libmetis : coinmetis
    tests    : true

Found ninja-1.10.1 at /usr/bin/ninja

meson compile -C builddir

INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/ar612/Installers/SPRAL/spral/builddir
ninja: Entering directory `/home/ar612/Installers/SPRAL/spral/builddir'
[118/195] Linking target libspral.so
FAILED: libspral.so 
gfortran  -o libspral.so libspral.so.p/interfaces_C_lsmr.f90.o libspral.so.p/interfaces_C_matrix_util.f90.o libspral.so.p/interfaces_C_random.f90.o libspral.so.p/interfaces_C_random_matrix.f90.o libspral.so.p/interfaces_C_rutherford_boeing.f90.o libspral.so.p/interfaces_C_scaling.f90.o libspral.so.p/interfaces_C_ssids.f90.o libspral.so.p/interfaces_C_ssmfe.f90.o libspral.so.p/interfaces_C_ssmfe_core.f90.o libspral.so.p/interfaces_C_ssmfe_expert.f90.o libspral.so.p/src_cuda_cuda.f90.o libspral.so.p/src_hw_topology_hw_topology.f90.o libspral.so.p/src_ssids_cpu_cpu_iface.f90.o libspral.so.p/src_ssids_cpu_subtree.f90.o libspral.so.p/src_ssids_gpu_alloc.f90.o libspral.so.p/src_ssids_gpu_cpu_solve.f90.o libspral.so.p/src_ssids_gpu_datatypes.f90.o libspral.so.p/src_ssids_gpu_dense_factor.f90.o libspral.so.p/src_ssids_gpu_factor.f90.o libspral.so.p/src_ssids_gpu_interfaces.f90.o libspral.so.p/src_ssids_gpu_smalloc.f90.o libspral.so.p/src_ssids_gpu_solve.f90.o libspral.so.p/src_ssids_gpu_subtree.f90.o libspral.so.p/src_ssids_akeep.f90.o libspral.so.p/src_ssids_anal.F90.o libspral.so.p/src_ssids_contrib.f90.o libspral.so.p/src_ssids_contrib_free.f90.o libspral.so.p/src_ssids_datatypes.f90.o libspral.so.p/src_ssids_fkeep.F90.o libspral.so.p/src_ssids_inform.f90.o libspral.so.p/src_ssids_profile_iface.f90.o libspral.so.p/src_ssids_ssids.f90.o libspral.so.p/src_ssids_subtree.f90.o libspral.so.p/src_ssmfe_core.f90.o libspral.so.p/src_ssmfe_expert.f90.o libspral.so.p/src_ssmfe_ssmfe.f90.o libspral.so.p/src_blas_iface.f90.o libspral.so.p/src_core_analyse.f90.o libspral.so.p/src_lapack_iface.f90.o libspral.so.p/src_lsmr.f90.o libspral.so.p/src_match_order.f90.o libspral.so.p/src_matrix_util.f90.o libspral.so.p/src_pgm.f90.o libspral.so.p/src_random.f90.o libspral.so.p/src_random_matrix.f90.o libspral.so.p/src_rutherford_boeing.f90.o libspral.so.p/src_scaling.f90.o libspral.so.p/src_timer.f90.o libspral.so.p/src_metis5_wrapper.F90.o libspral.so.p/src_hw_topology_guess_topology.cxx.o libspral.so.p/src_ssids_cpu_kernels_cholesky.cxx.o libspral.so.p/src_ssids_cpu_kernels_ldlt_app.cxx.o libspral.so.p/src_ssids_cpu_kernels_ldlt_nopiv.cxx.o libspral.so.p/src_ssids_cpu_kernels_ldlt_tpp.cxx.o libspral.so.p/src_ssids_cpu_kernels_wrappers.cxx.o libspral.so.p/src_ssids_cpu_NumericSubtree.cxx.o libspral.so.p/src_ssids_cpu_SymbolicSubtree.cxx.o libspral.so.p/src_ssids_cpu_ThreadStats.cxx.o libspral.so.p/src_ssids_profile.cxx.o libspral.so.p/src_compat.cxx.o libspral.so.p/src_omp.cxx.o libspral.so.p/src_cuda_api_wrappers.cu.o libspral.so.p/src_ssids_gpu_kernels_assemble.cu.o libspral.so.p/src_ssids_gpu_kernels_dense_factor.cu.o libspral.so.p/src_ssids_gpu_kernels_reorder.cu.o libspral.so.p/src_ssids_gpu_kernels_solve.cu.o libspral.so.p/src_ssids_gpu_kernels_syrk.cu.o -L/usr/lib/gcc/x86_64-linux-gnu/11 -L/usr/lib/gcc/x86_64-linux-gnu/11/../../../x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib -L/usr/lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/11/../../.. -L/lib -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,libspral.so -fopenmp -Wl,--start-group -lstdc++ -lopenblas -lopenblas -lcoinmetis -lhwloc -lrt -lpthread -ldl -lcudart_static -lcublas -lm -lgfortran -Wl,--end-group
/usr/bin/ld: cannot find -lcudart_static: No such file or directory
/usr/bin/ld: cannot find -lcublas: No such file or directory
collect2: error: ld returned 1 exit status
[119/195] Compiling C++ object kernelst_cpp.p/tests_ssids_kernels_ldlt_app.cxx.o
ninja: build stopped: subcommand failed.

However, the cuda library path is included in LD_LIBRARY_PATH

echo $LD_LIBRARY_PATH

/usr/local/lib:/usr/local/cuda-12.5/lib64:/usr/local/cuda-12.5/lib64:/usr/local/cuda-12.5/lib64:

ls /usr/local/cuda-12.5/lib64

cmake                         libcufftw.so.11           libcusolver_lapack_static.a  libnppicc.so.12.3.0.116   libnppisu.so.12.3.0.116  libnvjpeg_static.a
libaccinj64.so                libcufftw.so.11.2.3.18    libcusolver_metis_static.a   libnppicc_static.a        libnppisu_static.a       libnvperf_host.so
libaccinj64.so.12.5           libcufftw_static.a        libcusolverMg.so             libnppidei.so             libnppitc.so             libnvperf_host_static.a
libaccinj64.so.12.5.39        libcufile_rdma.so         libcusolverMg.so.11          libnppidei.so.12          libnppitc.so.12          libnvperf_target.so
libcheckpoint.so              libcufile_rdma.so.1       libcusolverMg.so.11.6.2.40   libnppidei.so.12.3.0.116  libnppitc.so.12.3.0.116  libnvptxcompiler_static.a
libcublasLt.so                libcufile_rdma.so.1.10.0  libcusolver.so               libnppidei_static.a       libnppitc_static.a       libnvrtc-builtins.so
libcublasLt.so.12             libcufile_rdma_static.a   libcusolver.so.11            libnppif.so               libnpps.so               libnvrtc-builtins.so.12.5
libcublasLt.so.12.5.2.13      libcufile.so              libcusolver.so.11.6.2.40     libnppif.so.12            libnpps.so.12            libnvrtc-builtins.so.12.5.40
libcublasLt_static.a          libcufile.so.0            libcusolver_static.a         libnppif.so.12.3.0.116    libnpps.so.12.3.0.116    libnvrtc-builtins_static.a
libcublas.so                  libcufile.so.1.10.0       libcusparse.so               libnppif_static.a         libnpps_static.a         libnvrtc.so
libcublas.so.12               libcufile_static.a        libcusparse.so.12            libnppig.so               libnvblas.so             libnvrtc.so.12
libcublas.so.12.5.2.13        libcufilt.a               libcusparse.so.12.4.1.24     libnppig.so.12            libnvblas.so.12          libnvrtc.so.12.5.40
libcublas_static.a            libcuinj64.so             libcusparse_static.a         libnppig.so.12.3.0.116    libnvblas.so.12.5.2.13   libnvrtc_static.a
libcudadevrt.a                libcuinj64.so.12.5        libmetis_static.a            libnppig_static.a         libnvfatbin.so           libnvToolsExt.so
libcudart.so                  libcuinj64.so.12.5.39     libnppc.so                   libnppim.so               libnvfatbin.so.12        libnvToolsExt.so.1
libcudart.so.12               libculibos.a              libnppc.so.12                libnppim.so.12            libnvfatbin.so.12.5.39   libnvToolsExt.so.1.0.0
libcudart.so.12.5.39          libcupti.so               libnppc.so.12.3.0.116        libnppim.so.12.3.0.116    libnvfatbin_static.a     libOpenCL.so
libcudart_static.a            libcupti.so.12            libnppc_static.a             libnppim_static.a         libnvJitLink.so          libOpenCL.so.1
libcufft.so                   libcupti.so.2024.2.0      libnppial.so                 libnppist.so              libnvJitLink.so.12       libOpenCL.so.1.0
libcufft.so.11                libcupti_static.a         libnppial.so.12              libnppist.so.12           libnvJitLink.so.12.5.40  libOpenCL.so.1.0.0
libcufft.so.11.2.3.18         libcurand.so              libnppial.so.12.3.0.116      libnppist.so.12.3.0.116   libnvJitLink_static.a    libpcsamplingutil.so
libcufft_static.a             libcurand.so.10           libnppial_static.a           libnppist_static.a        libnvjpeg.so             stubs
libcufft_static_nocallback.a  libcurand.so.10.3.6.39    libnppicc.so                 libnppisu.so              libnvjpeg.so.12
libcufftw.so                  libcurand_static.a        libnppicc.so.12              libnppisu.so.12           libnvjpeg.so.12.3.2.38

@amontoison shouldn't meson be picking up CUBLAS here?

@jfowkes the compilation must be done with the nvfortran compiler if you do it on GPU.

FC=nvfortran meson setup builddir ...

Dear @amontoison,

I've tried what you suggested, but it still doesn't work. Here are the outputs of the commands I used.

FC=nvfortran meson setup builddir -Dexamples=true -Dtests=true -Dlibblas=openblas -Dliblapack=openblas -Dlibmetis=coinmetis
The Meson build system
Version: 1.4.0
Source dir: /home/ar612/Installers/SPRAL/spral
Build dir: /home/ar612/Installers/SPRAL/spral/builddir
Build type: native build
Project name: SPRAL
Project version: 2024.05.08
Fortran compiler for the host machine: nvfortran (nvidia_hpc 24.7-0)
Fortran linker for the host machine: nvfortran pgi 24.7-0
C compiler for the host machine: cc (gcc 11.4.0 "cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
C linker for the host machine: cc ld.bfd 2.38
C++ compiler for the host machine: c++ (gcc 11.4.0 "c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
C++ linker for the host machine: c++ ld.bfd 2.38
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Library openblas found: YES
Library openblas found: YES
Library coinmetis found: YES
Library hwloc found: YES
Run-time dependency CUDA (modules: cudart_static, rt, pthread, dl, cublas) found: YES 12.5 (/usr/local/cuda-12.5)
Library m found: YES
Has header "cblas.h" : YES 
Has header "hwloc.h" : YES 
Build targets in project: 45

SPRAL 2024.05.08

  User defined options
    examples : true
    libblas  : openblas
    liblapack: openblas
    libmetis : coinmetis
    tests    : true

Found ninja-1.10.1 at /usr/bin/ninja
FC=nvfortran meson compile -C builddir
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/ar612/Installers/SPRAL/spral/builddir
ninja: Entering directory `/home/ar612/Installers/SPRAL/spral/builddir'
[108/195] Compiling Fortran object libspral.so.p/src_ssids_gpu_factor.f90.o
FAILED: libspral.so.p/src_ssids_gpu_factor.f90.o libspral.so.p/spral_ssids_gpu_factor.mod 
nvfortran -Ilibspral.so.p -I. -I.. -Iinclude -I../include -Isrc -I../src -I/usr/local/cuda-12.5/include -O3 -mp -fPIC -module libspral.so.p -o libspral.so.p/src_ssids_gpu_factor.f90.o -c ../src/ssids/gpu/factor.f90
NVFORTRAN-F-0000-Internal compiler error. mk_assign_sptr: upper bound missing       0  (../src/ssids/gpu/factor.f90: 101)
NVFORTRAN/x86-64 Linux 24.7-0: compilation aborted
[110/195] Compiling C++ object kernelst_cpp.p/tests_ssids_kernels_ldlt_app.cxx.o
ninja: build stopped: subcommand failed.

It seems to be an issue with the new version of the nvfortran compiler.
It was working with the version 23.x.

@jfowkes Can you fix the error in src/ssids/gpu/factor.f90 or is the issue in the compiler?

@amontoison unfortunately this is an internal compiler error:

NVFORTRAN-F-0000-Internal compiler error. mk_assign_sptr: upper bound missing

and as such a bug introduced by NVIDIA.

@jfowkes
I quickly checked the line 101 and at the line 100, we have in additional space in the goto:
https://github.com/ralna/spral/blob/master/src/ssids/gpu/factor.f90#L100
It's maybe related ?!

@amontoison very well spotted! I will fix that now (it should work nonetheless).

@aleramos119 could you try again on the latest version from the master branch? If that fixes it I'll do a new release.

I'm sorry for the late response @jfowkes. I tried the latest version of the master branch, but the error persists.

rm -rf spral
git clone -b master https://github.com/ralna/spral.git
cd spral
FC=nvfortran meson setup builddir -Dexamples=true -Dtests=true -Dlibblas=openblas -Dliblapack=openblas -Dlibmetis=coinmetis
FC=nvfortran meson compile -C builddir
Cloning into 'spral'...
remote: Enumerating objects: 11545, done.
remote: Counting objects: 100% (1082/1082), done.
remote: Compressing objects: 100% (392/392), done.
remote: Total 11545 (delta 785), reused 828 (delta 687), pack-reused 10463 (from 1)
Receiving objects: 100% (11545/11545), 7.97 MiB | 5.38 MiB/s, done.
Resolving deltas: 100% (8686/8686), done.
The Meson build system
Version: 1.4.0
Source dir: /home/ar612/Installers/SPRAL/spral
Build dir: /home/ar612/Installers/SPRAL/spral/builddir
Build type: native build
Project name: SPRAL
Project version: 2024.05.08
Fortran compiler for the host machine: nvfortran (nvidia_hpc 24.7-0)
Fortran linker for the host machine: nvfortran pgi 24.7-0
C compiler for the host machine: cc (gcc 11.4.0 "cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
C linker for the host machine: cc ld.bfd 2.38
C++ compiler for the host machine: c++ (gcc 11.4.0 "c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0")
C++ linker for the host machine: c++ ld.bfd 2.38
Host machine cpu family: x86_64
Host machine cpu: x86_64
Cuda compiler for the host machine: nvcc (nvcc 12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0)
Cuda linker for the host machine: nvcc nvlink 12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
Library openblas found: YES
Library openblas found: YES
Library coinmetis found: YES
Library hwloc found: YES
Run-time dependency CUDA (modules: cudart_static, rt, pthread, dl, cublas) found: YES 12.5 (/usr/local/cuda-12.5)
Library m found: YES
Has header "cblas.h" : YES 
Has header "hwloc.h" : YES 
Build targets in project: 45

SPRAL 2024.05.08

  User defined options
    examples : true
    libblas  : openblas
    liblapack: openblas
    libmetis : coinmetis
    tests    : true

Found ninja-1.10.1 at /usr/bin/ninja
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /home/ar612/Installers/SPRAL/spral/builddir
ninja: Entering directory `/home/ar612/Installers/SPRAL/spral/builddir'
[107/195] Compiling Fortran object libspral.so.p/src_ssids_gpu_factor.f90.o
FAILED: libspral.so.p/src_ssids_gpu_factor.f90.o libspral.so.p/spral_ssids_gpu_factor.mod 
nvfortran -Ilibspral.so.p -I. -I.. -Iinclude -I../include -Isrc -I../src -I/usr/local/cuda-12.5/include -O3 -mp -fPIC -module libspral.so.p -o libspral.so.p/src_ssids_gpu_factor.f90.o -c ../src/ssids/gpu/factor.f90
NVFORTRAN-F-0000-Internal compiler error. mk_assign_sptr: upper bound missing       0  (../src/ssids/gpu/factor.f90: 101)
NVFORTRAN/x86-64 Linux 24.7-0: compilation aborted
[110/195] Compiling C++ object kernelst_cpp.p/tests_ssids_kernels_ldlt_app.cxx.o
ninja: build stopped: subcommand failed.
make: *** No rule to make target 'check'.  Stop.

Thank you @aleramos119, so unfortunately this is an Nvidia internal compiler bug:

NVFORTRAN-F-0000-Internal compiler error. mk_assign_sptr: upper bound missing

Nothing we can do until Nvidia fix it I'm afraid (may be worth reporting to them).