marcomusy/vedo

Vedo and torch.multiprocessing

Opened this issue · 1 comments

Hi, using DDP in torch leads to the following error when calling plotter.show:

Traceback (most recent call last):
  File "/home/user/train.py", line 627, in <module>
    mp.spawn(
  File "/home/user/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 239, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/user/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 197, in start_processes
    while not context.join():
  File "/home/user/venv/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 140, in join
    raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with signal SIGSEGV
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 2 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Here is a sample code:

plotter = vedo.Plotter(
        N=1,
        offscreen=False,
    )
pointcloud = vedo.Points(pts.detach().cpu().numpy())
plotter.show(
            pointcloud,
            legends[idx]
)

My vedo version is vedo==2023.4.6.

The plotter is visible only on the main process and the data in use has been cloned/detached to prevent any shared access. I'm wondering if that's a known problem between vedo and torch.multiprocessing?

To be honest I have no experience with torch.multiprocessing, I can only suggest to upgrade vedo to the latest version, but i' m not sure that that can cure the problem.
If i'm not mistaken the upstream VTK needs to start the interctive window in the main thread.