Installation problems: libtorch_cuda_cu.so not found

Question

Installation problems: libtorch_cuda_cu.so not found

Closed this issue 2 years ago · 5 comments

We installed miniconda (user-level install), then activated a new conda environment for tankbind, and we installed TankBind following the instructions to the letter. We found the following error:

(tankbind) rodriguezg@darwin:~/code/TankBind/examples$ python -V
Python 3.8.13

(tankbind) rodriguezg@darwin:~/code/TankBind/examples$ python virtual_screening_test_tankbind.py 
Traceback (most recent call last):
  File "virtual_screening_test_tankbind.py", line 11, in <module>
    from feature_utils import get_protein_feature
  File "/data/home/rodriguezg/code/TankBind/examples/../tankbind/feature_utils.py", line 21, in <module>
    from torchdrug import data as td     # conda install torchdrug -c milagraph -c conda-forge -c pytorch -c pyg if fail to import
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torchdrug/__init__.py", line 1, in <module>
    from . import patch
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torchdrug/patch.py", line 13, in <module>
    from torchdrug import core, data
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torchdrug/core/__init__.py", line 2, in <module>
    from .engine import Engine
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torchdrug/core/engine.py", line 10, in <module>
    from torchdrug import data, core, utils
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torchdrug/data/__init__.py", line 1, in <module>
    from .dictionary import PerfectHash, Dictionary
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torchdrug/data/dictionary.py", line 4, in <module>
    from torch_scatter import scatter_max
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torch_scatter/__init__.py", line 16, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/site-packages/torch/_ops.py", line 255, in load_library
    ctypes.CDLL(path)
  File "/home/rodriguezg/miniconda3/envs/tankbind/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
(tankbind) rodriguezg@darwin:~/code/TankBind/examples$

This library libtorch_cuda_cu.so is nowhere to be found. Any ideas?

For what is worh, the host has nvidia kernel driver version 510.85.02 (I'm not sure if this influences conda to install different versions of CUDA libs).

Answer 1 · 2022-11-09T02:15:49.000Z

torch_scatter is part of PyG. could try to re-install the version that match your cuda.
following the instruction described in "https://github.com/pyg-team/pytorch_geometric"

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.12.0+${CUDA}.html
pip install torch-geometric

Hope this help.
Wei

Answer 2 · 2022-11-11T12:08:04.000Z

Thank you very much; your suggestion put me in the right direction. And since I'll admit I am biased against Conda's shotgun approach to dependency management, I decided to:

use neither Conda nor Miniconda; use only pip
follow the latest (pip-related) install instructions of the original authors of each required package
and use the latest possible (stable) versions of all packages, subject to common-denominator constraints (for instance, PyG does not currently publish wheels for Torch 1.13; see below).

With this in mind, the following installation procedure worked for me. My Host environment is Ubuntu 22.04LTS and Nvidia driver 510 (CUDA 11.6). I had to install Python 3.9 (I use pyenv for this) because TorchDrug requires a version < 3.10. Then, on a clean venv, I installed everything as follows:

MY_VENV=tb_venv
CUDA=cu116

################################
# Install TankBind deps
################################
#
# Install PyTorch (latest stable is PyTorch 1.13, but PyG only supports up to 1.12):
pip install torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/${CUDA} 

# Install Torch Geometric (needs PyTorch <=1.12)
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+${CUDA}.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.12.1+${CUDA}.html
pip install torch-geometric

# Install TorchDrug (Requires: Python >=3.7, <3.10)
pip install torch-cluster -f https://data.pyg.org/whl/torch-1.12.1+${CUDA}.html 
pip install torchdrug

# The rest didn't give me any trouble:
pip install torchmetrics biopython jupyterlab nglview tqdm mlcrate pyarrow


####################
# Install TankBind 
###################
#
# Clone the TankBind repo:
rm -rf TankBind
git clone https://github.com/luwei0917/TankBind.git

# Dowload and install p2rank
cd TankBind
wget https://github.com/rdk/p2rank/releases/download/2.4/p2rank_2.4.tar.gz
tar zxf p2rank_2.4.tar.gz

So this gives me an apparently flawless install. Then I tried to run TankBind, but I found an error. Here's the output when running examples/high_throughput_virtual_screening_LRRK2_WDR.ipynb (I extracted this Notebook's code to a script, in order to run it from the console):

(tb_venv) marinjl@darwin:~/TANKBIND/TankBind/examples$ python virtual_screening_test_tankbind.py 
Reading protein...
Protein PDB read and formatted.
----------------------------------------------------------------------------------------------
 P2Rank 2.4
----------------------------------------------------------------------------------------------

predicting pockets for proteins from dataset [protein_list.ds]
processing [6dlo.pdb] (1/1)
predicting pockets finished in 0 hours 0 minutes 8.083 seconds
results saved to directory [/data/home/marinjl/TANKBIND/TankBind/examples/HTVS/p2rank]

----------------------------------------------------------------------------------------------
 finished successfully in 0 hours 0 minutes 9.751 seconds
----------------------------------------------------------------------------------------------
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [00:01<00:00, 7913.11it/s]
Pocketing succesfully done.
Processing...
Done!
['HTVS/dataset/processed/data.pt', 'HTVS/dataset/processed/protein.pt']
Constructing dataset...
12:30:57   5 stack, readout2, pred dis map add self attention and GVP embed, compound model GIN
Loading model...
Ready to start inference.
  0%|                                                                                                                                                        | 0/2000 [00:00<?, ?it/s]
/home/marinjl/.pyenv/versions/tb_venv/lib/python3.9/site-packages/torchdrug/utils/decorator.py:189: UserWarning: from_smiles(): argument `node_feature` is deprecated in favor of `atom_feature`
  warnings.warn("%s(): argument `%s` is deprecated in favor of `%s`" % (obj.__name__, key, value))

[... a few more deprecation warnings like this from TorchDrug, which look harmless ...]

  0%|                                                                                                                                                        | 0/2000 [00:19<?, ?it/s]
Traceback (most recent call last):
  File "/data/home/marinjl/TANKBIND/TankBind/examples/virtual_screening_test_tankbind.py", line 169, in <module>
    y_pred, affinity_pred = model(data)
  File "/home/marinjl/.pyenv/versions/tb_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/home/marinjl/TANKBIND/TankBind/examples/../tankbind/model.py", line 343, in forward
    compound_out = self.conv_compound(compound_edge_index,edge_weight,compound_edge_feature,compound_x.shape[0],compound_x)['node_feature']
  File "/home/marinjl/.pyenv/versions/tb_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/home/marinjl/TANKBIND/TankBind/examples/../tankbind/GINv2.py", line 182, in forward
    hidden = layer(edge_list, edge_weight, edge_feature, num_node, layer_input)
  File "/home/marinjl/.pyenv/versions/tb_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/home/marinjl/TANKBIND/TankBind/examples/../tankbind/GINv2.py", line 110, in forward
    update = self.message_and_aggregate(edge_list, edge_weight, edge_feature, num_node, input)
  File "/data/home/marinjl/TANKBIND/TankBind/examples/../tankbind/GINv2.py", line 91, in message_and_aggregate
    edge_update = self.edge_linear(edge_update)
  File "/home/marinjl/.pyenv/versions/tb_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/marinjl/.pyenv/versions/tb_venv/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (123x18 and 19x56)

Any ideas? Do you guys have a sort of minimal working example in order to test if TankBind is installed correctly?

Answer 3 · 2022-11-14T02:01:49.000Z

"RuntimeError: mat1 and mat2 shapes cannot be multiplied (123x18 and 19x56)" occurs when using newer version of torchdrug. Please ensure that you are using v0.1.2.

Answer 4 · 2022-11-14T15:59:49.000Z

Thanks, the example is working now. Just out of curiosity (and because I have the whole install process scripted), I went backwards testing all previous versions of TorchDrug in sequence (see https://pypi.org/project/torchdrug/#history). It was only after I arrived at torchdrug version 0.1.2.post1 that the error went away. In turn, this version needed a downgrade to Python 3.8.

Any chance you guys will upgrade your code to the latest TorchDrug soon?

Anyway, I'm closing this issue since everything seems to be working fine. Thanks for your help!

Answer 5 · 2022-11-15T01:49:35.000Z

Glad to hear that you resolve the problem. We plan to drop the dependency of torchdrug in the future. it was mainly used to extract features from the small molecule, but it's a bit too heavily loaded with other modules that we don't need.