While training the precision, recall and fmeasure values are 0 and all the losses are NaN

Question

While training the precision, recall and fmeasure values are 0 and all the losses are NaN

himanshurawlani opened this issue 5 years ago · 5 comments

When I execute the following code:

gen_train = InputGenerator(gt_util_train, prior_util, batch_size, model.image_size, augmentation=False)
gen_val = InputGenerator(gt_util_val, prior_util, batch_size, model.image_size, augmentation=False)
tmp_inputs, tmp_targets = next(gen_train.generate())

I get the following RuntimeWarning

ssd_detectors/tbpp_utils.py:83: RuntimeWarning: divide by zero encountered in log
  offsets_rboxs[prior_mask,4] = np.log(gt_rboxes[:,4] / priors_wh[:,1]) / variances_wh[:,1]

I get the same warning when I try to train TBPP512 or TBPP512_dense model. Also while training my Precision, recall metrics are 0 and conf_loss and loc_loss are NaN. Is it due to the above warning ? If not then how can I debug? Here are the metrics for 1st epoch:

Epoch 1/100
5/5 [==============================] - 55s 11s/step - loss: nan - conf_loss: nan - loc_loss: nan - precision: 0.0000e+00 - recall: 0.0000e+00 - accuracy: 0.8000 - fmeasure: 0.0000e+00 - num_pos: 43.2000 - num_neg: 611588.8000 - val_loss: nan - val_conf_loss: nan - val_loc_loss: nan - val_precision: 0.0000e+00 - val_recall: 0.0000e+00 - val_accuracy: 1.0000 - val_fmeasure: 0.0000e+00 - val_num_pos: 80.5000 - val_num_neg: 611551.5000

I have used a custom dataset with 1 class only and modified it according to the format required by GTUtility. I've also verified the values in gt_util_train and gt_util_val and they seem to be correct.

Answer 1 · 2019-07-18T07:40:33.000Z

I guess your dataset contains bounding boxes with zero width or height.

0 is the 'background' class and 1 is the 'text' class. Therefore you have at least 2 classes in the GTUtility...

Answer 2 · 2019-07-19T11:16:31.000Z

Okay will check my dataset. By "1 class" I meant one positive class. Also, I want to train this model for multiple classes (say 5 classes), please can you help me how to go about it?

I tried modifying the utils scripts for multiple classes and converted my dataset in the format as required by GTUtility (Images/ and .mat file) but the training couldn't start.

After some debugging, I realized the output shape of the last layer had changed from (x, 19) (for 1 class) to (x, 23) (for 5 classes). So what all needs to be changed to train TBPP512 or TBPP512_dense model for 5 classes?

Answer 3 · 2019-08-02T12:08:03.000Z

I have checked my dataset, there are no bounding boxes with zero width or height.

Answer 4 · 2019-08-02T14:05:37.000Z

Maybe a type issue. The model input should be float32. If I remember it right, I had similar issues with float64. Otherwise, I have no idea...

Answer 5 · 2019-08-02T14:09:59.000Z

btw, log of a negative number results in NaN...