tbepler/topaz

Topaz training error in multiprocessing data loader

tbepler opened this issue · 0 comments

Discussed in #172

Originally posted by wuhucryoem July 24, 2023
When I do a topaz training, it show me there haven't some file or directory, but don't show me the concrete file or directory.Like that:

Traceback (most recent call last):
File "/home/amax/miniconda3/envs/topaz/bin/topaz", line 33, in
sys.exit(load_entry_point('topaz-em==0.2.5', 'console_scripts', 'topaz')())
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main
args.func(args)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 695, in main
, save_prefix=save_prefix, use_cuda=use_cuda, output=output)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 577, in fit_epochs
, use_cuda=use_cuda, output=output)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 552, in fit_epoch
for X,Y in data_iterator:
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1152, in _get_data
success, data = self._try_get_data()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd
fd = df.detach()
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/amax/miniconda3/envs/topaz/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory