QData/spacetimeformer

OSError: libtorch_global_deps.so: cannot open shared object file: No such file or directory

mateibejan1 opened this issue · 2 comments

After creating the environment and running the script provided in the Example Spacetimeformer Training Commands section of the repo, I get the following stack trace:

Traceback (most recent call last):
File "train.py", line 7, in <module>
import pytorch_lightning as pl
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 30, in <module>
from pytorch_lightning.callbacks import Callback  # noqa: E402
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
from pytorch_lightning.callbacks.base import Callback
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/site-packages/pytorch_lightning/callbacks/base.py", line 21, in <module>
import torch
File "/home/fsuser/.local/lib/python3.8/site-packages/torch/__init__.py", line 196, in <module>
_load_global_deps()
File "/home/fsuser/.local/lib/python3.8/site-packages/torch/__init__.py", line 149, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/fsuser/miniconda3/envs/spacetimeformer/lib/python3.8/ctypes/__init__.py", line 369, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/fsuser/.local/lib/python3.8/site-packages/torch/lib/libtorch_global_deps.so: cannot open shared object file: No such file or directory

The same stack trace is replicable by simply importing torch in a .py script.

I've checked /home/fsuser/.local/lib/python3.8/site-packages/torch/lib/ and there is no libtorch_global_deps.so file. Do I have to pull it from somewhere or install some other torch library?

I'm running this code on Ubuntu 20.04, python3.8 torch1.9.0 and cuda 10.2.

Hi, this looks like a generic PyTorch installation error that isn't related to the spacetimeformer code or PyTorch lightning (which is a third-party library that basically handles training loop boilerplate). Installing PyTorch can be surprisingly tricky at times, especially with cuda version conflicts and so on. I recommend making a new environment and installing the latest version of PyTorch (1.11), which has been tested with the latest version of this repo. I've had to set up PyTorch on a bunch of different servers and in my experience it's usually easier to fix cuda compatibility issues by installing with conda rather than pip. https://pytorch.org/get-started/locally/

You'll know it worked if you can do

import torch
torch.cuda.is_available()

and get "True"

Thanks for reaching back!

I've recreated the environment with python 3.8, torch 1.11.0, cuda 10.2 and installed the requirements via pip. However, the error still persists.