RuntimeError: stack expects a non-empty TensorList
cccccccccy opened this issue · 6 comments
Hi, I preprocessed COCO2017 dataset with python datasets/register_coco_edge.py
. But when I trained this network with python train_net.py --num-gpus 1 --config-file configs/Dance_R_50_3x.yaml
, I still faced a problem which said:
'ERROR [05/08 10:49:58 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/caoyang/detectron2/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/caoyang/detectron2/detectron2/engine/train_loop.py", line 214, in run_step
loss_dict = self.model(data)
File "/home/caoyang/anaconda3/envs/dance/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/caoyang/dance/core/modeling/edge_snake/dance.py", line 140, in forward
features, proposals, (gt_sem_seg, [gt_instances, images.image_sizes])
File "/home/caoyang/anaconda3/envs/dance/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/caoyang/dance/core/modeling/edge_snake/edge_det.py", line 270, in forward
_, poly_loss = self.refine_head(snake_input, None, targets[1])
File "/home/caoyang/anaconda3/envs/dance/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/caoyang/dance/core/modeling/edge_snake/snake_head.py", line 1881, in forward
training_targets = self.compute_targets_for_polys(gt_instances, image_sizes)
File "/home/caoyang/dance/core/modeling/edge_snake/snake_head.py", line 1232, in compute_targets_for_polys
init_ex_targets = torch.stack(init_ex_targets, dim=0)
RuntimeError: stack expects a non-empty TensorList'
I guess the reason is that there is no target in the picture, or the target is not marked. And would you like to tell me how to solve this problem.
Hi @cccccccccy, when did this error happen? Is it almost immediately after you launch the training script or after a while? If it happened after a while, did it happen almost after the same period of time after launching?
Hi , this error occurs in the first epoch at the same time everytime.
[05/11 15:22:02 d2.engine.hooks]: Overall training speed: 1796 iterations in 0:13:25 (0.4488 s / it) [05/11 15:22:02 d2.engine.hooks]: Total training time: 0:13:28 (0:00:02 on hooks)
And I also have another problem. I link the snake module in dance model (without att module, using the same preprocessing and loss design) to my initial contour prediction model. There is a problem that my initial contour will be closer to the groundtruth compared to dance's initial contour, so the snake loss (0,1,2) value is very small (about 0.01-0.02), so would you like to tell me how to change to to train at this situation?
Hi , this error occurs in the first epoch at the same time everytime.
[05/11 15:22:02 d2.engine.hooks]: Overall training speed: 1796 iterations in 0:13:25 (0.4488 s / it) [05/11 15:22:02 d2.engine.hooks]: Total training time: 0:13:28 (0:00:02 on hooks)
If this is the case, it is likely certain image has bad annotation. Need to identify it and filter it out.
And I also have another problem. I link the snake module in dance model (without att module, using the same preprocessing and loss design) to my initial contour prediction model. There is a problem that my initial contour will be closer to the groundtruth compared to dance's initial contour, so the snake loss (0,1,2) value is very small (about 0.01-0.02), so would you like to tell me how to change to to train at this situation?
It seems that your initial contour prediction is better than directly sampling on the box. This makes the later refinement (snake predicting offsets) easier thus obtaining very small loss. A simple solution is to have a coefficient to magnify the snake loss.
Additional suggestion could be that you can visualise your initial contour prediction and the ground truth offset that you hope snake module to learn. And then do some analysis on what kind of loss is suitable to learn this offset, how to do proper re-scaling to balance the losses, etc.
Thanks for your suggestiones !