Training a 512x512 coco detector

Question

Training a 512x512 coco detector

JimKlingshirn opened this issue 5 years ago · 2 comments

I successfully trained a 512x512 resnet101 detector on coco, using your code, and Torch version 4.1 (python 2.7, cuda 10). It needed a couple of minor changes to the source code. In particular, the loss calculation in arm_loss.py and odm_loss.py requires division by float, instead of integer.

-        loss_l /= total_num
-        loss_c /= total_num
+        loss_l /= total_num.float()
+        loss_c /= total_num.float()

Using 4 GPUS and a batch size of 24, the ARM_conf_loss oscillated for several hundred iterations, but then it calmed down and trained normally. The final accuracy after 120K iterations is pretty good:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.333
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.542
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.358
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.153
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.385
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.476
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.288
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.444
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.472
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.257
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.535
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.636

Thanks for sharing your code.

Answer 1 · 2019-06-18T01:51:39.000Z

That's great! Thank you too.

Answer 2 · 2020-04-19T08:49:17.000Z

Now，the problem of oscillated loss has been solved ! Hope it will help.