NERSC/buildtest-nersc

[Bug]: aml spack test failed due to libcudart.so

Opened this issue · 1 comments

CDASH Build

https://my.cdash.org/test/102913122

Link to buildspec file

https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/spack_test/perlmutter/23.05/aml.yml

Please describe the issue?

The error is in the following line where we cant find libcudart.so library

Command exited with status 127:
    './0_hello'
./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory

Apparently some tests require the cuda runtime even though this build of aml is without cuda support

-- linux-sles15-zen3 / gcc@11.2.0 -------------------------------
dzrvltdzrinvi5ps73jmxco3fsevwc2l aml@0.2.0~cuda~hip~hwloc~opencl~ze build_system=autotools hip-platform=none  /global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc2l

Relevant log output

_______________________________________________________________________________________________________
     The Extreme-Scale Scientific Software Stack (E4S) is accessible via the
Spack package manager.

     In order to access the production stack, you will need to load a spack
environment. Here are some tips to get started:


     'spack env list' - List all Spack environments
     'spack env activate gcc' - Activate the "gcc" Spack environment
     'spack env status' - Display the active Spack environment
     'spack load amrex' - Load the "amrex" Spack package into your user
environment

     For additional support, please refer to the following references:

       NERSC E4S Documentation: https://docs.nersc.gov/applications/e4s/
       E4S Documentation: https://e4s.readthedocs.io
       Spack Documentation: https://spack.readthedocs.io/en/latest/
       Spack Slack: https://spackpm.slack.com


______________________________________________________________________________________________________
     
==> Error: TestFailure: 1 test failed.


Command exited with status 127:
    './0_hello'
./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory



1 error found in test log:
     3    ==> [2023-11-07-07:43:24.179560] test: test_check_tutorial: Compile and run the tutorial tests as install checks
     4    ==> [2023-11-07-07:43:24.183898] '/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/env/gcc/gcc' '-o' '0_hello' '/global/ho
          mes/b/bdtest/.spack/test/rvdzxngt3yt5wumdyamqifh7kuv3mw3w/aml-0.2.0-dzrvltd/cache/aml/doc/tutorials/hello_world/0_hello.c' '-I/global/common/software
          /spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc2l/include' '-I/global/comm
          on/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/numactl-2.0.14-thubjl4qwojk3icuocgn6uhmetkk4vkj/include'
           '-L/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc
          2l/lib' '-laml' '-lexcit' '-lpthread'
     5    ==> [2023-11-07-07:43:26.163540] './0_hello'
     6    ./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
     7    FAILED: Aml::test_check_tutorial: Command exited with status 127:
     8        './0_hello'
  >> 9    ./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
     10   
     11     File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/bin/spack", line 54, in <module>
     12       sys.exit(main())
     13     File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/spack_installable/main.py", line 37, in main
     14       sys.exit(spack.main.main(argv))
     15     File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/spack/main.py", line 1018, in main


See test log for details:
  /global/homes/b/bdtest/.spack/test/rvdzxngt3yt5wumdyamqifh7kuv3mw3w/aml-0.2.0-dzrvltd-test-out.txt

==> Error: 1 test(s) in the suite failed.

I think a potential workaround could be we could try loading the cudatoolkit library ml cudatoolkit/11.7 and see if that helps fix the issue. We should use the hardcoded version of cuda that was used to build the other packages with cuda support.