Training error after batch 616/5763 epoch 0/300

Question

Training error after batch 616/5763 epoch 0/300

anisghaoui opened this issue 4 years ago · 1 comments

HI,
I am trying to train the model as you mentioned it in the readme and for some reasons it crashes :

Traceback (most recent call last):
  File "train.py", line 99, in <module>
    loss, outputs = model(imgs, targets)
  File "/home/anis/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anis/Complex-YOLOv3/models.py", line 266, in forward
    x, layer_loss = module[0](x, targets, img_dim)
  File "/home/anis/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anis/Complex-YOLOv3/models.py", line 190, in forward
    ignore_thres=self.ignore_thres,
  File "/home/anis/Complex-YOLOv3/utils/utils.py", line 375, in build_targets
    best_ious, best_n = ious.max(0)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

After looking for this error over the web, I found out that it might be the a missing/bad input. Any idea on why this would happen ?

Answer 1 · 2020-05-28T09:10:15.000Z

Ok, I went through the code and found that the data augmentation performed by the datasetloader might be faulty for some reasons :

in train.py :

  # Get dataloader
    dataset = KittiYOLODataset(
        cnf.root_dir,
        split='train',
        mode='TRAIN',
        folder='training',
        data_aug=False, # problems occur if set to true 
        multiscale=opt.multiscale_training
    )

This implies that the augmentation performs a transform on a data but, somehow, may not manage to do the same or simply misshape either the bounding boxes or key points.

Anyway, the training is now working.

Edit : added script name
Edit 2 : added ref links :
eriklindernoren/PyTorch-YOLOv3#110
feiyuhuahuo/Yolact_minimal#1