RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

Question

RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

Closed this issue 2 years ago · 5 comments

I met this problem when I run the command python train.py --experiment FutureDetection --model forecast_n0.
I've changed the voxel_size to [0.2, 0.2] in line 101, and [0.1, 0.1, 0.2] in line 162.
The detail traceback is as followed:
Traceback (most recent call last):
File "./tools/train.py", line 143, in
main()
File "./tools/train.py", line 138, in main
logger=logger,
File "/home/lu/Workspace/FutureDet/det3d/torchie/apis/train.py", line 358, in train_detector
trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, cfg=cfg, local_rank=cfg.local_rank)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 584, in run
epoch_runner(data_loaders[i], self.epoch, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 450, in train
self.model, data_batch, train_mode=True, **kwargs
File "/home/lu/Workspace/FutureDet/det3d/torchie/trainer/trainer.py", line 392, in batch_processor_inline
losses = model(example, return_loss=True)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 50, in forward
x, _ = self.extract_feat(data)
File "/home/lu/Workspace/FutureDet/det3d/models/detectors/voxelnet.py", line 29, in extract_feat
x = self.neck(x)
File "/home/lu/anaconda3/envs/futuredet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lu/Workspace/FutureDet/det3d/models/necks/rpn.py", line 157, in forward
x = torch.cat(ups, dim=1)
RuntimeError: Sizes of tensors must match except in dimension 2. Got 136 and 135 (The offending index is 0)

It was it the forward function in rpn.py:
def forward(self, x): ups = [] for i in range(len(self.blocks)): x = self.blocks[i](x) if i - self._upsample_start_idx >= 0: ups.append(self.deblocks[i - self._upsample_start_idx](x)) if len(ups) > 0: x = torch.cat(ups, dim=1) return x
The x.shape is torch.Size([1, 256, 135, 135])；
and the structure of the self.blocks network is
ModuleList( (0): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), bias=False) (2): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(128, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) (1): Sequential( (0): ZeroPad2d(padding=(1, 1, 1, 1), value=0.0) (1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), bias=False) (2): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (3): ReLU() (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (6): ReLU() (7): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (8): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (9): ReLU() (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (11): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (12): ReLU() (13): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (14): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (15): ReLU() (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (17): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (18): ReLU() ) )
the structure of the self.deblocks network is
ModuleList( (0): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) (1): Sequential( (0): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=0.001, momentum=0.01, affine=True, track_running_stats=True) (2): ReLU() ) )
The debug information is as followed

Is this issue caused by the setting of neural network parameters? Or because the change of voxel_size?

Answer 1 · 2023-06-14T17:09:40.000Z

I think I understand what the issue is. This error is because you changed the voxel_size. When you change the voxel size, you also need to adjust the grid size. Here's a small code snippet that should help you compute the right values for the config.

voxel_size = [0.075, 0.075, 0.2]

point_cloud_range = [-54, -54, -3, 54, 54, 3]

sparse_shape = [int((abs(point_cloud_range[2]) + abs(point_cloud_range[5])) / voxel_size[2]) + 1, int((abs(point_cloud_range[1]) + abs(point_cloud_range[4])) / voxel_size[1]), int((abs(point_cloud_range[0]) + abs(point_cloud_range[3])) / voxel_size[0])]

grid_size = [int((abs(point_cloud_range[0]) + abs(point_cloud_range[3])) / voxel_size[0]), int((abs(point_cloud_range[1]) + abs(point_cloud_range[4])) / voxel_size[1]), int((abs(point_cloud_range[2]) + abs(point_cloud_range[5])) / voxel_size[2])]

This example config (unrelated to FutureDet) should give you an idea where you need to plug in the new values.

Answer 2 · 2023-06-15T07:30:03.000Z

Yes the issue was caused by the change of voxel_size. But I don't know where to adjust the grid size, which doesn't seem to appear in the nusc_centerpoint_forecase_n0_detection.py.
I solved this problem by changing the voxel_size in line 162 to [0.96, 0.96, 0.2](I randomly tried different combinations of values) and kept the voxel_size in line 101 as [0.75, 0.75] and it worked.
By the way may I ask how long dose it take to forecast a trajectory with futuredet, as we wanted to deploy it on a mobile robot for navigation, so we may be more concerned about real-time performance.
Thank you very much.

Answer 3 · 2023-06-28T14:01:42.000Z

I found that if I set the parameter voxel_size to [0.96, 0.96, 0.2], it will be too large to train good results. Trying [0.125, 0.125, 0.2] may be a better choice. Besides also change the voxel_size in line 101 to [0.125, 0.125]. As I found that if the voxel_size is different at two locations, the prediction result will be empty. So it's best to keep the voxel_size the same at these two locations.

Answer 4 · 2023-12-21T11:00:47.000Z

Sir, hello! Could you tell me the real-time performance of the model you tested? I tested it at 0.8s per frame. But I feel that the process is very long, is it because of multiple nms or is there something wrong with my test? ?

Answer 5 · 2023-12-26T08:34:15.000Z

@taylorlulu Sir, hello! Could you tell me the real-time performance of the model you tested? I tested it at 0.85s per frame. But I feel that the process is very long, is it because of multiple nms or is there something wrong with my test?