The size of "tgt_mask" become (0, 32000) during the training
ZM-Zhou opened this issue · 6 comments
Hello, thanks for releasing this excellent work! (of course I gave the star~)
When I tried to train the model, I got the following error:
File "/github/SparseOcc/models/sparseocc.py", line 127, in forward
return self.forward_train(**kwargs)
File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
return old_func(*args, **kwargs)
File "/github/SparseOcc/models/sparseocc.py", line 134, in forward_train
return self.forward_pts_train(img_feats, voxel_semantics, voxel_instances, instance_class_ids, mask_camera, img_metas)
File "/github/SparseOcc/models/sparseocc.py", line 123, in forward_pts_train
return self.pts_bbox_head.loss(*loss_inputs)
File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
return old_func(*args, **kwargs)
File "/github/SparseOcc/models/sparseocc_head.py", line 55, in loss
return self.loss_single(voxel_semantics, voxel_instances, instance_class_ids, preds_dicts, mask_camera)
File "/github/SparseOcc/models/sparseocc_head.py", line 103, in loss_single
indices = self.matcher(pred, preds_dicts['class_preds'][i], voxel_instances, instance_class_ids, mask_camera)
File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/github/SparseOcc/models/matcher.py", line 146, in forward
tgt_mask = tgt_mask.view(tgt_mask.shape[0], -1)
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
It seems like that the prediction does not matching any valid label, thus the size of tgt_mask
is (0, 32000).
BTW, Could you release the training log of the model, it will be helpful for further checking the model.
which config?
I used r50_nuimg_704x256_8f.py
did u modify the setting? it's OK on my server.
Try to modify line210 in loaders/pipelines/loading.py .
results['voxel_semantics'] = semantics.astype(np.int) results['voxel_instances'] = final_instances.astype(np.int)
sorry I cannot reproduce the error, could you try the latest codebase?
Try to modify line210 in loaders/pipelines/loading.py .
results['voxel_semantics'] = semantics.astype(np.int) results['voxel_instances'] = final_instances.astype(np.int)
It works!! thank you so much.
BTW, in the latest codebase, they are moved to line 219-220