Sunmingy opened this issue 4 years ago · 1 comments
Hi @Sunmingy, If you're not launching main.sh to a SLURM cluster but locally then you should use torch.distributed.launch. For example replace
torch.distributed.launch
DeeperCluster/main.sh
Line 16 in d38ada1
python -m torch.distributed.launch --nproc_per_node=$NGPU main.py
$NGPU