[Bug]: aml spack test failed due to libcudart.so
Opened this issue · 1 comments
shahzebsiddiqui commented
CDASH Build
https://my.cdash.org/test/102913122
Link to buildspec file
Please describe the issue?
The error is in the following line where we cant find libcudart.so library
Command exited with status 127:
'./0_hello'
./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
Apparently some tests require the cuda runtime even though this build of aml is without cuda support
-- linux-sles15-zen3 / gcc@11.2.0 -------------------------------
dzrvltdzrinvi5ps73jmxco3fsevwc2l aml@0.2.0~cuda~hip~hwloc~opencl~ze build_system=autotools hip-platform=none /global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc2l
Relevant log output
_______________________________________________________________________________________________________
The Extreme-Scale Scientific Software Stack (E4S) is accessible via the
Spack package manager.
In order to access the production stack, you will need to load a spack
environment. Here are some tips to get started:
'spack env list' - List all Spack environments
'spack env activate gcc' - Activate the "gcc" Spack environment
'spack env status' - Display the active Spack environment
'spack load amrex' - Load the "amrex" Spack package into your user
environment
For additional support, please refer to the following references:
NERSC E4S Documentation: https://docs.nersc.gov/applications/e4s/
E4S Documentation: https://e4s.readthedocs.io
Spack Documentation: https://spack.readthedocs.io/en/latest/
Spack Slack: https://spackpm.slack.com
______________________________________________________________________________________________________
==> Error: TestFailure: 1 test failed.
Command exited with status 127:
'./0_hello'
./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
1 error found in test log:
3 ==> [2023-11-07-07:43:24.179560] test: test_check_tutorial: Compile and run the tutorial tests as install checks
4 ==> [2023-11-07-07:43:24.183898] '/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/env/gcc/gcc' '-o' '0_hello' '/global/ho
mes/b/bdtest/.spack/test/rvdzxngt3yt5wumdyamqifh7kuv3mw3w/aml-0.2.0-dzrvltd/cache/aml/doc/tutorials/hello_world/0_hello.c' '-I/global/common/software
/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc2l/include' '-I/global/comm
on/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/numactl-2.0.14-thubjl4qwojk3icuocgn6uhmetkk4vkj/include'
'-L/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/aml-0.2.0-dzrvltdzrinvi5ps73jmxco3fsevwc
2l/lib' '-laml' '-lexcit' '-lpthread'
5 ==> [2023-11-07-07:43:26.163540] './0_hello'
6 ./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
7 FAILED: Aml::test_check_tutorial: Command exited with status 127:
8 './0_hello'
>> 9 ./0_hello: error while loading shared libraries: libcudart.so.11.0: cannot open shared object file: No such file or directory
10
11 File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/bin/spack", line 54, in <module>
12 sys.exit(main())
13 File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/spack_installable/main.py", line 37, in main
14 sys.exit(spack.main.main(argv))
15 File "/global/common/software/spackecp/perlmutter/e4s-23.05/89639/spack/lib/spack/spack/main.py", line 1018, in main
See test log for details:
/global/homes/b/bdtest/.spack/test/rvdzxngt3yt5wumdyamqifh7kuv3mw3w/aml-0.2.0-dzrvltd-test-out.txt
==> Error: 1 test(s) in the suite failed.
shahzebsiddiqui commented
I think a potential workaround could be we could try loading the cudatoolkit library ml cudatoolkit/11.7
and see if that helps fix the issue. We should use the hardcoded version of cuda that was used to build the other packages with cuda support.