When I train my dataset, after serveral epoches, loss is Nan.
Opened this issue · 5 comments
I had tried the repo to train our dataset which has been trained successfully with Centernet.
However, I try several configs(eg rm asff, modify lr, load coco weight...). It got nan loss after several epoch.
Now the Only successful exp is using COCO weight and not modify the num_class to our dataset, which indices that the pretrained guide anchor or other part in YoloHead is very important.
Our dataset is very small(12 iters for a epoch), so I also tried modify the warmup epoch to 20. But it got Nan loss Also.
Could you provide some suggestion?
I tried to modify the warmup epoch to 50 and train successfully.
The are some questions about the process with mixup.
- The weight of box is only set to obj_loss. Why not deal with cls_loss and reg_loss?
- How to use mixup in FasterRCNN. use in only in RPN cls_loss?
I can not figure out the details in the paper “Bag of Freebies for Training Object Detection Neural Networks” and appreciate it for you reply!
The learning rate is set too large, it will be nan, you can consider reducing the learning rate or gradient clipping. I also had this problem
can you open your soruce?
@LUOBO123LUOBO123 I just modified the lines commented above!
In details , modify TRAIN.BURN_IN(in cfg file) to 50 work for me.
@LUOBO123LUOBO123 I just modified the lines commented above!
In details , modify TRAIN.BURN_IN(in cfg file) to 50 work for me.
Hi,I want to train my dataset,can you put your source code into your github?Thank you for your help.