Random code crashes with multiple num_workers.
Closed this issue · 3 comments
I get following error that is completely random. Doesn't make sense. num_workers=0 works just fine.
File "/opt/conda/lib/python3.8/multiprocessing/queues.py", line 108, in get raise Empty _queue.Empty The above exception was the direct cause of the following exception: Traceback (most recent call last): File "train.py", line 327, in <module> main() File "train.py", line 129, in main train(epoch, image_extractor, model, trainloader, optimizer, writer) File "train.py", line 159, in train_normal for idx, data in tqdm(enumerate(trainloader), total=len(trainloader), desc = 'Training'): File "/opt/conda/lib/python3.8/site-packages/tqdm/std.py", line 1166, in __iter__ for obj in iterable: File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data idx, data = self._get_data() File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data success, data = self._try_get_data() File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 692128) exited unexpectedly
Hi @gulzainali98, thanks for spotting this! I will look into this as well and let you know.
This error occurs sometimes. Sometimes, there is no problem at all.
This was a pytorch problem. Everything works fine on pytorch 1.8