zkyseu/O2SFormer

Size mismatch in the formward pass of the SegDecoder

Closed this issue · 17 comments

Hello,

I am attempting to run the code with the culane ResNet18 backend on Python 3.7, and I get the following error/tensor size mismatch on x = self.conv(x) of the forward pass of the SegDecoder module:

Given groups=1, weight of size [5, 960, 1, 1], expected input[8, 896, 40, 100] to have 960 channels, but got 896 channels instead

Do you happen to know what might cause this issue? I have not changed or edited much of the code. 3 colleagues also working on this codebase face the same error.

Thank you

Further, with ResNet34 I get the following error inside focal_loss.py:

Exception has occurred: RuntimeError sigmoid_focal_loss_forward_impl: implementation for device cuda:0 not found. File "/home/charles/Desktop/IAAIP-Transformer/dnlane/models/losses/focal_loss.py", line 57, in forward avg_factor=avg_factor) File "/home/charles/Desktop/IAAIP-Transformer/dnlane/models/detection_headv2.py", line 258, in loss_single_bs cls_loss = cls_loss + self.cls_loss(cls_pred, cls_target).sum() File "/home/charles/Desktop/IAAIP-Transformer/dnlane/models/detection_headv2.py", line 224, in loss cls_loss_list,reg_xytl_loss_list,iou_loss_list = multi_apply(self.loss_single_bs,predictions_layer,targets_list,ota_matched_row_inds,ota_matched_col_inds,assigned_labels) File "/home/charles/Desktop/IAAIP-Transformer/dnlane/models/detector.py", line 326, in forward_train loss = self.bbox_head.loss(**head_out) File "/home/charles/Desktop/IAAIP-Transformer/dnlane/models/detector.py", line 330, in train_step losses = self.forward_train(**data) File "/home/charles/Desktop/IAAIP-Transformer/dnlane/apis/train.py", line 251, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/charles/Desktop/IAAIP-Transformer/train_net.py", line 256, in main meta=meta) File "/home/charles/Desktop/IAAIP-Transformer/train_net.py", line 260, in <module> main() RuntimeError: sigmoid_focal_loss_forward_impl: implementation for device cuda:0 not found.

zkyseu commented

@charlesalec Hi, please install the mmcv-full to fix the error caused by focal loss. You can refer to Issue.

Hello, thanks for your reply. I do have mmcv-full installed, version 1.7.1, the same version you list on the repository. If I use the latest version I get many other errors due to deprecated code in the repository, so it's not possible to fix it like this.

Any other ideas?

Only possible solution in the linked thread is open-mmlab/mmdetection#6765 (comment)

Do you think this is the issue?

zkyseu commented

@charlesalec Yes. Do not install the mmcv-full using pip3 install mmcv-full==1.7.0.

zkyseu commented

@charlesalec Tomorrow I help you check the first problem.

Thanks so much for your assistance. Installing with the command above and 1.7.1 yields the following error during installation:

` error: command '/usr/bin/nvcc' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> mmcv-full

`

Further, how would this fix the issue with size mismatch with Resnet18? Since I think that is the main issue too

zkyseu commented

@charlesalec Hi, I have reproduced your first error about ResNet18. It can be fixed by modifying the backbone config, i.e., out_indices=(0,1,2,3) instead of (1,2,3). I'm very sorry for this error, which makes you confuse. Now I have updated the ResNet18 config and it can run. You can test it with the provided weight or train it.

zkyseu commented

Thanks so much for your assistance. Installing with the command above and 1.7.1 yields the following error during installation:

` error: command '/usr/bin/nvcc' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

× Encountered error while trying to install package. ╰─> mmcv-full

`

About this problem. You can refer to issue. It seems that your cuda is incompatible with your gcc.

Hi, thanks so much for your help. I fixed the ResNet18 config issue.

About the CUDA and GCC incompatibility issue: what CUDA and GCC version do you recommend installing?

Thanks a lot

I used gcc 8 with cuda 10.2, and now I get a new error which seems to come from a new new.

Exception has occurred: ImportError /home/charles/miniconda3/envs/lane-det/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE File "/home/charles/Desktop/IAAIP-Transformer/train_net.py", line 22, in <module> from mmdet.apis import init_random_seed, set_random_seed ImportError: /home/charles/miniconda3/envs/lane-det/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE

Can you perhaps clarify on exactly how you install mmdet correctly?

Currently I install via pip, and on version 2.28.2

Ok, I no longer get the error from above if I fully remove mmcv-full and then install mmcv via the command you specified: pip install mmcv==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.12/index.html

And now I get this error:

Exception has occurred: ModuleNotFoundError No module named 'mmcv._ext' File "/home/charles/Desktop/IAAIP-Transformer/train_net.py", line 22, in <module> from mmdet.apis import init_random_seed, set_random_seed ModuleNotFoundError: No module named 'mmcv._ext'

zkyseu commented

pip install mmcv==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.12/index.html

Hi, you must install mmcv-full instead of mmcv by pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.12/index.html

zkyseu commented

@charlesalec My cuda version is 10.2 and gcc version is 7.5.0. I install mmdet as follows.

  1. pip install mmcv-full==1.7.1 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.12/index.html. You can also install mmcv-full by pip install -U openmim and mim install mmcv-full==1.7.1.
  2. pip install mmdet==2.28.2

Hope this can help you.
Best

Hello,

That worked, thanks so much, I appreciate your help.

Best,
Charles