The size of "tgt_mask" become (0, 32000) during the training

Question

The size of "tgt_mask" become (0, 32000) during the training

ZM-Zhou opened this issue 8 months ago · 6 comments

Hello, thanks for releasing this excellent work! (of course I gave the star~)
When I tried to train the model, I got the following error:

File "/github/SparseOcc/models/sparseocc.py", line 127, in forward
    return self.forward_train(**kwargs)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
    return old_func(*args, **kwargs)
  File "/github/SparseOcc/models/sparseocc.py", line 134, in forward_train
    return self.forward_pts_train(img_feats, voxel_semantics, voxel_instances, instance_class_ids, mask_camera, img_metas)
  File "/github/SparseOcc/models/sparseocc.py", line 123, in forward_pts_train
    return self.pts_bbox_head.loss(*loss_inputs)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
    return old_func(*args, **kwargs)
  File "/github/SparseOcc/models/sparseocc_head.py", line 55, in loss
    return self.loss_single(voxel_semantics, voxel_instances, instance_class_ids, preds_dicts, mask_camera)
  File "/github/SparseOcc/models/sparseocc_head.py", line 103, in loss_single
    indices = self.matcher(pred, preds_dicts['class_preds'][i], voxel_instances, instance_class_ids, mask_camera)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/anaconda3/envs/sparseocc/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/github/SparseOcc/models/matcher.py", line 146, in forward
    tgt_mask = tgt_mask.view(tgt_mask.shape[0], -1)
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

It seems like that the prediction does not matching any valid label, thus the size of tgt_mask is (0, 32000).
BTW, Could you release the training log of the model, it will be helpful for further checking the model.

Answer 1 · 2024-04-18T03:04:29.000Z

which config?

Answer 2 · 2024-04-18T04:39:34.000Z

I used r50_nuimg_704x256_8f.py

Answer 3 · 2024-04-18T05:23:53.000Z

did u modify the setting? it's OK on my server.

Answer 4 · 2024-04-18T07:07:15.000Z

Try to modify line210 in loaders/pipelines/loading.py .
results['voxel_semantics'] = semantics.astype(np.int) results['voxel_instances'] = final_instances.astype(np.int)

Answer 5 · 2024-04-19T06:32:38.000Z

sorry I cannot reproduce the error, could you try the latest codebase?

Answer 6 · 2024-04-19T16:24:04.000Z

Try to modify line210 in loaders/pipelines/loading.py . results['voxel_semantics'] = semantics.astype(np.int) results['voxel_instances'] = final_instances.astype(np.int)

It works!! thank you so much.
BTW, in the latest codebase, they are moved to line 219-220