AIWintermuteAI/aXeleRate

The loss is not converge when training detector on VOC 2012

Closed this issue · 2 comments

Describe the bug
Hi, I tried to reproduce your MobileNet_yolov2 on VOC 2012 dataset. During the training process, the mAP is increasing but the value of loss function is unstable and not converge in the end.

To Reproduce

  1. Clone the repository to local machine
  2. modify the image/annotation folder of the configs/pascal_20_detector.json
  3. run train.py -c configs/pascal_20_detector.json

Expected behavior
The loss value return from (loss_xy + loss_wh + loss_conf + loss_class) should be close to zero after the training process.

Screenshots
The value of total loss from all steps :
截圖 2021-07-12 14 35 18

Environment (please complete the following information):

  • Local machine -> Ubuntu 18.04 & RTX 3080 GPU & CUDA 11.2
  • Python -> python 3.6 & tf-nightly-gpu==2.6.0.dev20210420

Thanks for your helping.

Hi there!
A few thoughts here:

  1. The epoch_loss graph looks really strange in the beginning - it seems that at zero epoch loss value has 4 different values. Perhaps you stopped the training with Ctrl-C and didn't manually delete logs directory?
  2. The loss value return from (loss_xy + loss_wh + loss_conf + loss_class) absolutely WILL NOT be close to zero after the training process. Pascal VOC 2012 training set has ~17k pictures, each of them containing multiple objects. Close to zero loss in practice would mean the model would predict almost every bounding box perfectly, which is not going to happen with rather small MobileNet model. You can get ~0 loss when training on smaller datasets and it would mean over-fitting.
    In general, you can consider model converged to the best state when mAP stops improving. It is a better metric for object detection, than loss.
  3. Old YOLOv2 loss actually had some problems, in particular when working with images of unequal dimensions (e.g. 320x240). You can try new YOLOv3 in dev branch - but you won't get ~0 loss on Pascal VOC 2012 as well.

Closing the issue due to inactivity.