lshqqytiger/ZLUDA

torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 3221225477

Closed this issue · 1 comments

"I want to try training with zluda, but I always encounter the error 'torch.multiprocessing.spawn.ProcessRaisedException:'. I found that this is when using torch.nn.parallel.DistributedDataParallel. The program will exit automatically and there will be no logs about DistributedDataParallel. I have tried increasing virtual memory and running as an administrator, but neither has solved the problem."

Traceback (most recent call last):
File "D:\GPT-SoVITS-beta0217\GPT_SoVITS\s2_train.py", line 600, in
main()
File "D:\GPT-SoVITS-beta0217\GPT_SoVITS\s2_train.py", line 56, in main
mp.spawn(
File "D:\GPT-SoVITS-beta0217\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 239, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "D:\GPT-SoVITS-beta0217\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 197, in start_processes
while not context.join():
File "D:\GPT-SoVITS-beta0217\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "D:\GPT-SoVITS-beta0217\runtime\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "D:\GPT-SoVITS-beta0217\GPT_SoVITS\s2_train.py", line 85, in run
train_dataset = TextAudioSpeakerLoader(hps.data) ########
File "D:\GPT-SoVITS-beta0217\GPT_SoVITS\module\data_utils.py", line 35, in init
assert os.path.exists(self.path2)
AssertionError

Traceback (most recent call last):
File "train.py", line 329, in
main()
File "train.py", line 44, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "D:\so-vits-svc_2\so-vits-svc\workenv\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "D:\so-vits-svc_2\so-vits-svc\workenv\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "D:\so-vits-svc_2\so-vits-svc\workenv\lib\site-packages\torch\multiprocessing\spawn.py", line 149, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 0 terminated with exit code 3221225477

It doesn't seem to be an issue of ZLUDA.
Please check if self.path2 exists.

assert os.path.exists(self.path2)
AssertionError