Questions regarding inference

Hi, I am running a test on your repo with Stanford Dog Dataset which has 120 species. The model trained really well, but I am a little confused with your inference pipeline. I just want to run a inference on a single image so I am referring to your eval.py and plot_heat.py at the moment.

Your eval.py seems to be calling SwinVit12, but plot_heat.py seems to be calling SwinVit12_demo. Are there difference between the two?

Just tried running eval.py and I am getting:

RuntimeError: Error(s) in loading state_dict for SwinVit12:
        size mismatch for gcn.adj1: copying a param with shape torch.Size([85, 85]) from checkpoint, the shape in current model is torch.Size([15, 15]).
        size mismatch for gcn.pool1.weight: copying a param with shape torch.Size([85, 2720]) from checkpoint, the shape in current model is torch.Size([15, 480]).
        size mismatch for gcn.pool1.bias: copying a param with shape torch.Size([85]) from checkpoint, the shape in current model is torch.Size([15]).
        size mismatch for gcn.pool4.weight: copying a param with shape torch.Size([1, 85]) from checkpoint, the shape in current model is torch.Size([1, 15]).

Seems like something is not configured properly on my end.

I did manage to make inference script. Maybe you should consider adding one there so that you can feed an image directly into the model without the labels.

One thing I noticed was that there are ['ori', 'l_1, 'l_3', 'gcn"] and etc in the model, and the model itself is huge (1.6 GB). If I want to make the model smaller without any additional steps like pruning or quantizing, how would you do that inside config.py?

Hi, I am running a test on your repo with Stanford Dog Dataset which has 120 species. The model trained really well, but I am a little confused with your inference pipeline. I just want to run a inference on a single image so I am referring to your eval.py and plot_heat.py at the moment.

Your eval.py seems to be calling SwinVit12, but plot_heat.py seems to be calling SwinVit12_demo. Are there difference between the two?

SwinVit12 and SwinVit12_demo use the same model. The only difference between these two is the return.

Just tried running eval.py and I am getting:

RuntimeError: Error(s) in loading state_dict for SwinVit12:
        size mismatch for gcn.adj1: copying a param with shape torch.Size([85, 85]) from checkpoint, the shape in current model is torch.Size([15, 15]).
        size mismatch for gcn.pool1.weight: copying a param with shape torch.Size([85, 2720]) from checkpoint, the shape in current model is torch.Size([15, 480]).
        size mismatch for gcn.pool1.bias: copying a param with shape torch.Size([85]) from checkpoint, the shape in current model is torch.Size([15]).
        size mismatch for gcn.pool4.weight: copying a param with shape torch.Size([1, 85]) from checkpoint, the shape in current model is torch.Size([1, 15]).

Seems like something is not configured properly on my end.

This problem seems like caused by the number of selections not matching.

So please check

FGVC-PIM/config.py

Line 31 in 3ecf29c

parser.add_argument("--num_selects",

and

FGVC-PIM/config_eval.py

Line 31 in 3ecf29c

parser.add_argument("--num_selects",

are matched.

I did manage to make inference script. Maybe you should consider adding one there so that you can feed an image directly into the model without the labels.

One thing I noticed was that there are ['ori', 'l_1, 'l_3', 'gcn"] and etc in the model, and the model itself is huge (1.6 GB). If I want to make the model smaller without any additional steps like pruning or quantizing, how would you do that inside config.py?

You may need to choose the smaller model to make the model smaller. We will release an efficient model and structure soon.

I see. Thank you for the reply!