Could not load dynamic library 'libcuda.so.1'

Question

Could not load dynamic library 'libcuda.so.1'

Nicobouch opened this issue a year ago · 1 comments

Description of the bug

I get the following errors using this command line:

nextflow run proteinfold-dev --input sample_multi.csv --outdir colabfold_test --mode colabfold --colabfold_server webserver --num_recycle 20 --colabfold_model_preset alphafold2_multimer_v3 -profile ifb_core

tried twice

Command used and terminal output

Command error:
  INFO:    Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  2023-05-26 18:16:41.703867: W external/org_tensorflow/tensorflow/tsl/platform/default/dso_loader.cc:66
] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: 
No such file or directory; LD_LIBRARY_PATH: /localcolabfold/colabfold-conda/lib:/usr/local/cuda/lib64:/.
singularity.d/libs
  2023-05-26 18:16:41.703924: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
  2023-05-26 18:16:52.656784: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
  
    0%|          | 0/450 [elapsed: 00:00 remaining: ?]
  SUBMIT:   0%|          | 0/450 [elapsed: 00:00 remaining: ?]
  COMPLETE:   0%|          | 0/450 [elapsed: 00:00 remaining: ?]
  COMPLETE: 100%|██████████| 450/450 [elapsed: 00:00 remaining: 00:00]
  COMPLETE: 100%|██████████| 450/450 [elapsed: 00:02 remaining: 00:00]
  
    0%|          | 0/450 [elapsed: 00:00 remaining: ?]
  SUBMIT:   0%|          | 0/450 [elapsed: 00:00 remaining: ?]
  COMPLETE:   0%|          | 0/450 [elapsed: 00:00 remaining: ?]
  COMPLETE: 100%|██████████| 450/450 [elapsed: 00:00 remaining: 00:00]
  COMPLETE: 100%|██████████| 450/450 [elapsed: 00:01 remaining: 00:00]

Relevant files

No response

System information

No response

Answer 1 · 2023-06-03T14:20:56.000Z

Assuming that you use CPUs, I managed to reproduce your issue on older generation CPUs that are not based on AVX2 instructions (eg SSE2), but not on AVX2 CPU processors. Looking a bit deeper into the bug, I realized that it crashes due to the following error:

E external/org_tensorflow/tensorflow/compiler/xla/service/cpu/simple_orc_jit.cc:211] Unable to resolve runtime symbol: `__extendhfsf2'.  Hint: if the symbol a custom call target, ma
ke sure you've registered it with the JIT using XLA_CPU_REGISTER_CUSTOM_CALL_TARGET.
  JIT session error: Symbols not found: [ __extendhfsf2 ]

Therefore, this is not a bug of the pipeline rather than of the latest version of colabfold. So, unless you don't have access to GPU or newer CPU resources, I would recommend to open this issue to the colabfold or localcolabfold repository.