hyz-xmaster/VarifocalNet

dist_train keep waiting

FX-STAR opened this issue · 4 comments

My env:
cuda10.2
torch==1.6.0
mmdetection==2.8.0
mmcv==1.2.4
After some iters the GPU-Util 100% but the process is always waiting
Could you provide your env or any advice?

Hi, I didn't run into this problem so can't provide effective solutions. The code is tested with: cuda 10.1, pytorch 1.6.0, mmdet 2.5.0, and mmcv=1.1.5. You may have a look at this page for more information about training.

@whoNamedCody ,hi,has the problem been resolved? I also faced this problem.

@whoNamedCody ,hi,has the problem been resolved? I also faced this problem.

no, i think the problem maybe in 'GiouLoss', but not debug yet