Error testing own model

Question

Error testing own model

oconnor127 opened this issue 4 years ago · 12 comments

Hey,

testing your model works without issues. If I want to apply my own trained model I get the following error:

bryan@bryan:~/Desktop/ENet-Real-Time-Semantic-Segmentation$ python3 init.py --mode test -m /home/bryan/Desktop/ENet-Real-Time-Semantic-Segmentation/ckpt-enet-90-23.889332741498947.pth -i /home/bryan/Desktop/ENet-Real-Time-Semantic-Segmentation/training/image_2/000000_10.png
Traceback (most recent call last):
File "init.py", line 153, in
test(FLAGS)
File "/home/bryan/Desktop/ENet-Real-Time-Semantic-Segmentation/test.py", line 24, in test
enet.load_state_dict(checkpoint['state_dict'])
File "/home/bryan/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ENet:
size mismatch for fullconv.weight: copying a param with shape torch.Size([16, 102, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 12, 3, 3]).

This even happens if I test the model on the trained data, so it cant be any format issues. Do you know whats the problem?

Best

Answer 1 · 2020-04-13T10:39:06.000Z

can you please print the shape of your input please?

Answer 2 · 2020-04-13T11:43:18.000Z

1242x375 with 3 Channels (RGB) image. Ground-Truth is a 1242x375 gray-scaled image.

Answer 3 · 2020-04-13T11:53:12.000Z

I know that, what I tried to see is if you are using channels first or channels last?

Answer 4 · 2020-04-13T12:32:59.000Z

I dont know, I've just trained your model using your command:

python3 init.py --mode train -iptr path/to/train/input/set/ -lptr /path/to/label/set/ and additionally the validation and testset paths

So I assume, the shape should by any standard format (because I've just used your specified command without modifying anythin), like 1242x375x3.

If I understood you correctly...?

Answer 5 · 2020-04-13T14:27:03.000Z

Can you share some code that I will be able to reproduce? which dataset did you use?

Answer 6 · 2020-04-13T16:20:39.000Z

As mentioned I did not use any code, I've just tried to train your model using your commands in the terminal. Should I add them? I've used the kitti semantic segmentation dataset:
http://www.cvlibs.net/datasets/kitti/eval_semseg.php?benchmark=semantics2015
Click on "Download label for semantic and instance segmentation"
In the "training" folder I've split the "image_2" as well as "semantic" into training, valid, test set.

Answer 7 · 2020-04-13T19:56:19.000Z

This implementation was used on Cityscapes and Camvid datasets. I don't remember by hard the dimensions of a single image in Kitti.

Answer 8 · 2020-08-10T18:21:34.000Z

Hi, I have the same problem. I think it is about the number of classes used during the training phase which is 102 by default. In fact, the error I get is:

Traceback (most recent call last):
  File "init.py", line 153, in <module>
    test(FLAGS)
  File "C:\Users\me\Desktop\Tesi\TEST\ENet-Real-Time-Semantic-Segmentation\test.py", line 24, in test
    enet.load_state_dict(checkpoint['state_dict'])
  File "C:\Users\me\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ENet:
        size mismatch for fullconv.weight: copying a param with shape torch.Size([16, 2, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 12, 3, 3]).

where 2 is the number of classes I've defined during the training phase.
Still no solution?
Thanks

Answer 9 · 2020-08-10T19:46:59.000Z

@oconnor127 I just solved the problem by changing one line in the test.py file (line 23) from enet = ENet(12) to enet = ENet(2) since in my case I trained a model for two classes. For sure there is a more elegant way to solve this but for now it seems to work for me.

Answer 10 · 2020-08-10T20:25:42.000Z

@pipponino which elegant way? you just need to pass the number of activation maps you expect to get.

Answer 11 · 2020-08-11T11:04:20.000Z

I ment that maybe can be used the num_classes flag instead that a fixed value. But I'm not sure, maybe I'm saying something wrong.

Answer 12 · 2020-08-11T15:50:34.000Z

In most DL model implementations this is the convention. If you encounter other issues please open the issue once again.