size mismatch error when loading checkpoint

Question

size mismatch error when loading checkpoint

Closed this issue 2 years ago · 2 comments

Hi, I try to load pre-trained checkpoints from here https://drive.google.com/file/d/14r23_yrpB3f_Jq9eOYcd_X9-dXk6Dpk3/view?usp=sharing but it shows an error:

RuntimeError: Error(s) in loading state_dict for HUMUSNetModule: size mismatch for model.cascades.0.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.0.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.0.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.1.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.1.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.1.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.2.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.2.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.2.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.3.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.3.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.3.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.4.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.4.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.4.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.5.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.5.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.5.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.6.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.6.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.6.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for model.cascades.7.model.conv_first.weight: copying a param with shape torch.Size([33, 6, 3, 3]) from checkpoint, the shape in current model is torch.Size([33, 2, 3, 3]). size mismatch for model.cascades.7.model.conv_last.weight: copying a param with shape torch.Size([6, 33, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 33, 3, 3]). size mismatch for model.cascades.7.model.conv_last.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([2]).

I didn't modify the script, eval_humus_fastmri.py neither I modify the .yaml file.

Answer 1 · 2022-12-30T10:55:41.000Z

Hi perikiz,

It looks like the issue is that the initialized model expects single-slice inputs, whereas the pre-trained fastMRI models are set up for adjacent slice reconstruction with 3 slices. This is why we see a difference in the number of convolution filters. Can you please make sure that num_adj_slices is set to 3 if you load fastMRI models? Let me know if there are further issues.

Answer 2 · 2023-01-18T17:29:55.000Z

Hi perikiz,

Thanks for pointing this out. We found a bug in loading pl modules from checkpoint, which should be fixed now. Please let me know if you still have issues running the evaluation code.