Multi-GPU
chenbys opened this issue · 2 comments
Thanks for the excellent paper and code.
My run with "--num-gpus 1" is alright, while "--num-gpus 2" leads to the following error:
python tools/train_net.py --num-gpus 2 --config-file configs/COCOA_cls-AmodalSegmentation/mask_rcnn_R_50_FPN_1x_parallel_CtRef_VAR_SPRef_SPRet_FM.yaml
.... File "/workspace/Amodal-Segmentation-Based-on-Visible-Region-Segmentation-and-Shape-Prior/detectron2/modeling/roi_heads/recon_net.py", line 318, in nearest_decode distances = torch.addmm(codebook_sqr + inputs_sqr, RuntimeError: expected device cuda:0 but got device cuda:1
In addition, some APIs like "dist.all_reduce" summarizing tensors across gpus seem not to exist.
Therefore, how to run this code for multi-gpu? Can we run this code in DP or DDP for multi-gpu?
Anyway, I am not familiar with detectron2.
Thanks for any suggestions.
I tried to run detectron2 for multi-GPU, but I failed. All of our experiments are finished on a single GPU.
I think maybe you can find the solution in Issues of Detectron2.
Thanks for the comments.
I found we could directly run by python tools/train_net.py --num-gpu 2 --config-file configs/COCOA_cls-AmodalSegmentation/mask_rcnn_R_50_FPN_1x_parallel.yaml
.
Therefore, it may need some modifications for compatibility.