Could not load dynamic library 'libcuda.so.1'
Nicobouch opened this issue · 1 comments
Description of the bug
I get the following errors using this command line:
nextflow run proteinfold-dev --input sample_multi.csv --outdir colabfold_test --mode colabfold --colabfold_server webserver --num_recycle 20 --colabfold_model_preset alphafold2_multimer_v3 -profile ifb_core
tried twice
Command used and terminal output
Command error:
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
2023-05-26 18:16:41.703867: W external/org_tensorflow/tensorflow/tsl/platform/default/dso_loader.cc:66
] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file:
No such file or directory; LD_LIBRARY_PATH: /localcolabfold/colabfold-conda/lib:/usr/local/cuda/lib64:/.
singularity.d/libs
2023-05-26 18:16:41.703924: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2023-05-26 18:16:52.656784: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
0%| | 0/450 [elapsed: 00:00 remaining: ?]
SUBMIT: 0%| | 0/450 [elapsed: 00:00 remaining: ?]
COMPLETE: 0%| | 0/450 [elapsed: 00:00 remaining: ?]
COMPLETE: 100%|██████████| 450/450 [elapsed: 00:00 remaining: 00:00]
COMPLETE: 100%|██████████| 450/450 [elapsed: 00:02 remaining: 00:00]
0%| | 0/450 [elapsed: 00:00 remaining: ?]
SUBMIT: 0%| | 0/450 [elapsed: 00:00 remaining: ?]
COMPLETE: 0%| | 0/450 [elapsed: 00:00 remaining: ?]
COMPLETE: 100%|██████████| 450/450 [elapsed: 00:00 remaining: 00:00]
COMPLETE: 100%|██████████| 450/450 [elapsed: 00:01 remaining: 00:00]
Relevant files
No response
System information
No response
Assuming that you use CPUs, I managed to reproduce your issue on older generation CPUs that are not based on AVX2 instructions (eg SSE2), but not on AVX2 CPU processors. Looking a bit deeper into the bug, I realized that it crashes due to the following error:
E external/org_tensorflow/tensorflow/compiler/xla/service/cpu/simple_orc_jit.cc:211] Unable to resolve runtime symbol: `__extendhfsf2'. Hint: if the symbol a custom call target, ma
ke sure you've registered it with the JIT using XLA_CPU_REGISTER_CUSTOM_CALL_TARGET.
JIT session error: Symbols not found: [ __extendhfsf2 ]
Therefore, this is not a bug of the pipeline rather than of the latest version of colabfold. So, unless you don't have access to GPU or newer CPU resources, I would recommend to open this issue to the colabfold or localcolabfold repository.