PaulCCCCCCH/Robustar_implementation

Insufficient shared memory in launcher

Opened this issue · 0 comments

The launcher gave an error of insuffiecient shared memory during the training process.
The very likely cause and its solutions are stated here,

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to

While --ipc=host is a quite labor-saving workaround, it could brought some safety issues along with it(see here).

Using --shm-size can be better, but we don't have an idea about how much shared memory should we allocate to the launcher and what can the consequence of assigning a big one to it.

More investigations will be done before we modify this part of the code.