sartorius-research/LIVECell

Error "libcudart.so.11.0: cannot open shared object file" when using Docker image

Opened this issue · 2 comments

I've been trying to train the LIVECELL anchor-based model with my dataset, but the model failed to start learning.

I used Docker image pytorch/pytorch:1.5-cuda10.1-cudnn7-devel to match the versions you mentioned in the paper.
Then I got the error saying "libcudart.so.11.0: cannot open shared object file: No such file or directory".

The error traceback is as follows:

Traceback (most recent call last):
  File "train_net.py", line 27, in <module>
    from detectron2.data import MetadataCatalog
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/data/__init__.py", line 4, in <module>
    from .build import (
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/data/build.py", line 14, in <module>
    from detectron2.structures import BoxMode
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/structures/__init__.py", line 6, in <module>
    from .keypoints import Keypoints, heatmaps_to_keypoints
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/structures/keypoints.py", line 6, in <module>
    from detectron2.layers import interpolate
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/layers/__init__.py", line 3, in <module>
    from .deform_conv import DeformConv, ModulatedDeformConv
  File "/workspaces/livecell-anchor-based/detectron2-ResNeSt/detectron2/layers/deform_conv.py", line 10, in <module>
    from detectron2 import _C
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

This is probably because the CUDA toolkit version inside Docker image (10.1) mismatches that of Detecton2-ResNest (11.x?).
Should I specify the version of Detectron2-ResNest?

Environment

Hardware

OS: Ubuntu 20.04.5 LTS on WSL 2
CPU: Intel Core i9-10940X
GPU:NVIDIA TITAN RTX(Turing architecture)
DRAM: 100GB

nvidia-smi

nvidia-smi

Hi @tsh11na,
It might be the case that it is the version Detectron2-ResNest that is causing problems and I see that the version of it is not specified in the repo.

@nabeelkhalid92, can you help out with which version you used?

Hi @tsh11na,
You have to install the detectron2 with the same Cuda version i.e., 10.1.
You can find the matching detectron2 versions from here: detectron2 installations
Also, the anchor-based model was implemented using the Python programming language v.3.6.10, the deep learning framework PyTorch v.1.5.0, and the object detection library Detectron2 v.2.1.