Project-MONAI/GenerativeModels

Runtime error while running 3d_ldm_tutorial.py

sinjan3101 opened this issue · 3 comments

@ericspod please have a look and let me know if I am using wrong version on any of the lib.

python3 tutorials/generative/3d_ldm/3d_ldm_tutorial.py
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Failed to load image Python extension: '/home/sinjan/.local/lib/python3.9/site-packages/torchvision/image.so: undefined symbol: _ZN3c104cuda9SetDeviceEi'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
MONAI version: 1.2.0
Numpy version: 1.26.0
Pytorch version: 2.0.1+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: c33f1ba588ee00229a309000e888f9817b4f1934
MONAI file: /home/sinjan/.local/lib/python3.9/site-packages/monai/init.py

Optional dependencies:
Pytorch Ignite version: 0.4.10
ITK version: 5.3.0
Nibabel version: 5.1.0
scikit-image version: 0.22.0
Pillow version: 10.0.1
Tensorboard version: 2.14.1
gdown version: 4.7.1
TorchVision version: 0.16.0+cu121
tqdm version: 4.66.1
lmdb version: 1.4.1
psutil version: 5.9.5
pandas version: 2.1.1
einops version: 0.7.0
transformers version: 4.34.0
mlflow version: 2.7.1
pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

/home/sinjan/mnai/Task01_BrainTumour/
monai.transforms.io.dictionary LoadImaged.init:image_only: Current default value of argument image_only=False has been deprecated since version 1.1. It will be changed to image_only=True in version 1.3.
<class 'monai.transforms.utility.dictionary.AddChanneld'>: Class AddChanneld has been deprecated since version 0.8. It will be removed in version 1.3. please use MetaTensor data type and monai.transforms.EnsureChannelFirstd instead with channel_dim='no_channel'.
monai.transforms.utility.dictionary EnsureChannelFirstd.init:meta_keys: Argument meta_keys has been deprecated since version 0.9. not needed if image is type MetaTensor.
Loading dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 388/388 [03:23<00:00, 1.91it/s]
Image shape torch.Size([1, 96, 96, 64])
Using cuda
The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=SqueezeNet1_1_Weights.IMAGENET1K_V1. You can also use weights=SqueezeNet1_1_Weights.DEFAULT to get the most up-to-date weights.
Epoch 0: 100%|██████████████████| 194/194 [02:02<00:00, 1.58it/s, recons_loss=0.064, gen_loss=0, disc_loss=0]
Epoch 1: 100%|█████████████████| 194/194 [01:17<00:00, 2.51it/s, recons_loss=0.0394, gen_loss=0, disc_loss=0]
Epoch 2: 100%|█████████████████| 194/194 [01:17<00:00, 2.50it/s, recons_loss=0.0343, gen_loss=0, disc_loss=0]
Epoch 3: 100%|█████████████████| 194/194 [01:17<00:00, 2.50it/s, recons_loss=0.0326, gen_loss=0, disc_loss=0]
Epoch 4: 100%|█████████████████| 194/194 [01:17<00:00, 2.49it/s, recons_loss=0.0293, gen_loss=0, disc_loss=0]
Epoch 5: 100%|█████████████████| 194/194 [01:18<00:00, 2.48it/s, recons_loss=0.0285, gen_loss=0, disc_loss=0]
Epoch 6: 100%|█████████| 194/194 [01:49<00:00, 1.77it/s, recons_loss=0.0271, gen_loss=0.475, disc_loss=0.348]
Epoch 7: 100%|█████████| 194/194 [01:47<00:00, 1.81it/s, recons_loss=0.0284, gen_loss=0.594, disc_loss=0.204]
Epoch 8: 100%|█████████| 194/194 [01:48<00:00, 1.79it/s, recons_loss=0.0296, gen_loss=0.599, disc_loss=0.212]
Epoch 9: 100%|█████████| 194/194 [01:48<00:00, 1.78it/s, recons_loss=0.0295, gen_loss=0.508, disc_loss=0.216]
Epoch 10: 100%|████████| 194/194 [01:49<00:00, 1.77it/s, recons_loss=0.0288, gen_loss=0.411, disc_loss=0.223]
Epoch 11: 100%|████████| 194/194 [01:51<00:00, 1.74it/s, recons_loss=0.0277, gen_loss=0.417, disc_loss=0.215]
Epoch 12: 100%|█████████| 194/194 [01:49<00:00, 1.77it/s, recons_loss=0.027, gen_loss=0.429, disc_loss=0.226]
Epoch 13: 100%|██████████| 194/194 [01:49<00:00, 1.77it/s, recons_loss=0.0268, gen_loss=0.4, disc_loss=0.228]
Epoch 14: 100%|█████████| 194/194 [01:50<00:00, 1.76it/s, recons_loss=0.026, gen_loss=0.385, disc_loss=0.226]
Epoch 15: 100%|████████| 194/194 [01:52<00:00, 1.72it/s, recons_loss=0.0261, gen_loss=0.387, disc_loss=0.221]
Epoch 16: 100%|████████| 194/194 [01:51<00:00, 1.74it/s, recons_loss=0.0258, gen_loss=0.394, disc_loss=0.222]
Epoch 17: 100%|█████████| 194/194 [01:53<00:00, 1.72it/s, recons_loss=0.026, gen_loss=0.385, disc_loss=0.227]
Epoch 18: 100%|████████| 194/194 [01:50<00:00, 1.75it/s, recons_loss=0.0254, gen_loss=0.383, disc_loss=0.228]
Epoch 19: 0%| | 0/194 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/sinjan/mnai/GenerativeModels/tutorials/generative/3d_ldm/3d_ldm_tutorial.py", line 204, in
for step, batch in progress_bar:
File "/home/sinjan/.local/lib/python3.9/site-packages/tqdm/std.py", line 1182, in iter
for obj in iterable:
File "/home/sinjan/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/sinjan/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1328, in _next_data
idx, data = self._get_data()
File "/home/sinjan/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1294, in _get_data
success, data = self._try_get_data()
File "/home/sinjan/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1132, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/lib/python3.9/multiprocessing/queues.py", line 122, in get
return _ForkingPickler.loads(res)
File "/home/sinjan/.local/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 307, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.9/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/lib/python3.9/multiprocessing/reduction.py", line 189, in recv_handle
return recvfds(s, 1)[0]
File "/usr/lib/python3.9/multiprocessing/reduction.py", line 164, in recvfds
raise RuntimeError('received %d items of ancdata' %
RuntimeError: received 0 items of ancdata

I can't say what this particular issue is but the first thing that I find online offers some solutions: https://discuss.pytorch.org/t/runtimeerror-received-0-items-of-ancdata/4999/7 I haven't seen this error before and would suspect it has to do with Pytorch only and not MONAI/Geneative.

@sinjan3101 Glad to hear it though I think this is an underlying issue with limitations set with ulimit, however depending how your system is administered you might not be able to change these. This solution works so just stick with it.