May I ask this error?

Question

May I ask this error?

Opened this issue a year ago · 1 comments

Could you tell me how to solve this problem?

(talk3d) F:\Talk3D>sh demo.sh
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8'
@@@@@@@@@@@@@@@@@@@@@
@ Training Talk3D @
@@@@@@@@@@@@@@@@@@@@@
N_gpus: 1
No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8'
Traceback (most recent call last):
File "main.py", line 106, in
spawn_mp(_main, world_size)
File "main.py", line 39, in spawn_mp
mp.spawn(running_fn,args=(world_size,),nprocs=world_size,join=True)
File "C:\Users\User.conda\envs\talk3d\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "C:\Users\User.conda\envs\talk3d\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "C:\Users\User.conda\envs\talk3d\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Users\User.conda\envs\talk3d\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "F:\Talk3D\main.py", line 29, in _main
setup(rank, world_size,opts)
File "F:\Talk3D\main.py", line 35, in setup
distributed.init_process_group('nccl', rank=rank, world_size=world_size)
File "C:\Users\User.conda\envs\talk3d\lib\site-packages\torch\distributed\distributed_c10d.py", line 761, in init_process_group
default_pg = _new_process_group_helper(
File "C:\Users\User.conda\envs\talk3d\lib\site-packages\torch\distributed\distributed_c10d.py", line 886, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL " "built in")
RuntimeError: Distributed package doesn't have NCCL built in

Answer 1 · 2024-04-16T06:27:20.000Z

Could you tell me how you installed the torch library?

It looks like your environment's CUDA and torch doesn't match. This installing script

pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

or other scripts in this link starting with pip install torch==x.xx.x+cu11x... such as

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117

pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116

might help, but I'm not sure.

Please let me know if the above script does not work.