ERROR multiprocessing.pool.RemoteTraceback:
ccgauvin94 opened this issue · 3 comments
Using the command in the tutorial, plus batch size 16, preprocessing cpus 16, when it gets to iteration 9, I get the following error:
"""
Traceback (most recent call last):
File "/opt/miniconda3/envs/isonet/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/opt/miniconda3/envs/isonet/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/opt/IsoNet/preprocessing/prepare.py", line 157, in get_cubes
get_cubes_one(data_X, data_Y, settings, start = start)
File "/opt/IsoNet/preprocessing/prepare.py", line 95, in get_cubes_one
noise_volume = read_vol(path_noise[path_index])
File "/opt/IsoNet/preprocessing/prepare.py", line 92, in read_vol
with mrcfile.open(f) as mf:
File "/home/t93j956/.local/lib/python3.9/site-packages/mrcfile/load_functions.py", line 138, in open
return NewMrc(name, mode=mode, permissive=permissive,
File "/home/t93j956/.local/lib/python3.9/site-packages/mrcfile/mrcfile.py", line 108, in __init__
self._open_file(name)
File "/home/t93j956/.local/lib/python3.9/site-packages/mrcfile/mrcfile.py", line 125, in _open_file
self._iostream = open(name, self._mode + 'b')
OSError: [Errno 9] Bad file descriptor: 'results/training_noise/n_00363.mrc'
"""
This has happened twice now, not sure why. GPU usage is at 33GB/45 but does it go up when the noise model stuff starts?
Hi,
This problem should happen at the the very beginning of iteration 10, with default noise settings. IsoNet will generate 1000 noise volumes using CPU.
Please check whether you already have results/training_noise folder, whether IsoNet already generated some mrc files in that folder containing noise, and whether you have enough space on disks.
I was trying to do this on a mounted SMB volume. Switching to local scratch seems to fix the issue. I wonder if the connection wasn't staying open or something.
Best,
Colin
I guess your SMB volume might not allow simultaneous IO with multiple processes. I do not have a solution but will keep this in mind.