How to run training with a single gpu
Dmytro-Shvetsov opened this issue · 1 comments
Dmytro-Shvetsov commented
I am trying to launch training of any of the YOLOF models. However when I run
pods_train --num-gpus 1 --num-machines 1
I am getting an error
Traceback (most recent call last):
File "/cyclists/lib/YOLOF/tools/train_net.py", line 109, in <module>
args=(args,),
File "/cyclists/lib/YOLOF/cvpods/engine/launch.py", line 56, in launch
main_func(*args)
File "/cyclists/lib/YOLOF/tools/train_net.py", line 95, in main
runner.train()
File "/cyclists/lib/YOLOF/cvpods/engine/runner.py", line 270, in train
super().train(self.start_iter, self.start_epoch, self.max_iter)
File "/cyclists/lib/YOLOF/cvpods/engine/base_runner.py", line 84, in train
self.run_step()
File "/cyclists/lib/YOLOF/cvpods/engine/base_runner.py", line 185, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "../yolof_base/yolof.py", line 134, in forward
pred_logits, pred_anchor_deltas)
File "../yolof_base/yolof.py", line 210, in losses
dist.all_reduce(num_foreground)
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 935, in all_reduce
_check_default_pg()
File "/opt/conda/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
Could you guide me what I am doing wrong?
My setup is
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:00:10.0 Off | N/A |
| 0% 46C P8 8W / 180W | 20MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Cuda 10.1
poodarchu commented