RuntimeError: Trying to create tensor with negative dimension
Rusteam opened this issue · 5 comments
Hi there,
while training I get the following error at test stage after some number of epochs:
Traceback (most recent call last):
File "/usr/src/app/pipelines/yolor/../../src/models/yolor/train.py", line 537, in <module>
train(hyp, opt, device, tb_writer, wandb)
File "/usr/src/app/pipelines/yolor/../../src/models/yolor/train.py", line 336, in train
results, maps, times = test.test(opt.data,
File "/usr/src/app/src/models/yolor/test.py", line 134, in test
output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)
File "/usr/src/app/src/models/yolor/utils/general.py", line 341, in non_max_suppression
i = torch.ops.torchvision.nms(boxes, scores, iou_thres)
File "/usr/local/lib/python3.9/dist-packages/torch/_ops.py", line 142, in __call__
return self._op(*args, **kwargs or {})
RuntimeError: Trying to create tensor with negative dimension -726820594: [-726820594]
My env:
torch=='1.12.0.dev20220314+cu102'
torchvision=='0.13.0.dev20220314+cu102'
python 3.9.10
Hi there,
I got an error message same as you on batch_size=2
.
I think the error was about batch_size, because i trier to change batch_size=3
, the error disappeared.
I don't know the total reason for error, but I can train my dataset on this method.
If I find the reason, I will tell you in here.
Hope can halp you.
My env:
python 3.7
torch==1.7.0+cu101
torchvision==0.8.1+cu101
torchaudio==0.7.0
GPU: RTX2080Ti 11G
I'm not sure about batch size, because it happens after a some number of epochs. Let's say it has been training fine and testing fine for 15 epochs and then suddenly it throws this error.
Also it feels that the value is a box coordinate and it should not be that high.
I'm not sure about batch size, because it happens after a some number of epochs. Let's say it has been training fine and testing fine for 15 epochs and then suddenly it throws this error.
Also it feels that the value is a box coordinate and it should not be that high.
Update: I debug the code.
In ./utils/general.py
here, I finded the reason of why happened this error.
In this file's 320 ~ 350 line, you can see the follow code:
320 # Box (center x, center y, width, height) to (x1, y1, x2, y2)
321 box = xywh2xyxy(x[:, :4])
... ...
347 # Batched NMS
348 c = x[:, 5:6] * (0 if agnostic else max_wh) # classes
349 boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores
350 i = torch.ops.torchvision.nms(boxes, scores, iou_thres)
You can try to debug the code when you train your models, In the 350 line, you can see the boxes's size variable is a large, but boxes(350 line) and box(321 line) is float32 and float16 type on your GPU, so I think the error is happended in here.
My solution:
I tried to change of ./test.py
's conf_thres in 35 line, like following:
31 def test(data,
32 weights=None,
33 batch_size=16,
34 imgsz=640,
35 conf_thres=0.001,
36 iou_thres=0.6, # for NMS
37 save_json=False,
38 single_cls=False,
39 augment=False,
40 verbose=False,
41 model=None,
42 dataloader=None,
43 save_dir=Path(''), # for saving images
44 save_txt=False, # for auto-labelling
45 save_conf=False,
46 plots=True,
47 log_imgs=0): # number of logged images
# After modification.
31 def test(data,
32 weights=None,
33 batch_size=16,
34 imgsz=640,
35 conf_thres=0.01,
36 iou_thres=0.6, # for NMS
37 save_json=False,
38 single_cls=False,
39 augment=False,
40 verbose=False,
41 model=None,
42 dataloader=None,
43 save_dir=Path(''), # for saving images
44 save_txt=False, # for auto-labelling
45 save_conf=False,
46 plots=True,
47 log_imgs=0): # number of logged images
This method can eliminate this error.
Hope can be help you. @Rusteam
Did it help?
Yes, The method can be help me.
I used my dataset on YOLOR.
Because my dataset is mini object detection, and I changed YOLOR's architecture, this is the reason for producing a lot of boxes.
The method would reduce a lot of boxes, You should adjust your conf_thres, according to your dataset and model architecture.
It is worth not that, You couldn't be boxes become to few, since the model would use them. If you want to a definite boxes's parameter, You can refer to the official example.