kjunelee/MetaOptNet

MiniImageNet 5way1shot acc

chmxu opened this issue · 8 comments

chmxu commented

I followed the default setting to train on miniimagenet and use the best_model.pth, which returns acc 59.28%, with a huge gap to the reported one which i don't think is resulted from random choice of episodes. Any idea?

Thank you for your interest in our code base.

First note that each meta-training run can yield different result. I experienced similar issues with many other few-shot learning algorithms. Also, different versions of packages may cause different behaviors (e.g. Python 3 might use different random seed than Python 2).

However, the accuracy of 59% seems to be lower than what I and other users of the repository have experienced. Would it be possible for you to report the configuration you used?

chmxu commented

python3.6 pytorch1.2.0 qpth0.0.15
this is my train script
python train.py --gpu 1,2 --save-path "./experiments/miniImageNet_MetaOptNet_SVM" --train-shot 15 --head SVM --network ResNet --dataset miniImageNet --eps 0.1 --episodes-per-batch 2

I think the result is suboptimal because episodes-per-batch is set to 2. In our experiments, we set episodes-per-batch to 8 by default.

chmxu commented

I train a model with 4 1080Ti with episodes-per-batch set as 8 and use a best_model which returns 64.13% acc on meta-val set. The model gets 60.42% acc on meta-test set, 2.2% lower than the reported one.

As mentioned in #8, each meta-training run can result in slightly different result. Also, #25 suggests that the result of both ProtoNet and MetaOptNet can vary across different environments. I experienced similar issues with many other few-shot learning algorithms. Also, I guess the versions of packages should matter. In my environment, I never had <61% accuracy on MetaOptNet-SVM when label smoothing is applied.

The message of our paper is about the gap between non-parametric base learners and parametric base learners, and I believe that the gap should exist within the same environment.

chmxu commented

no offence but this means we can't compare different methods fairly :)

That’s a good point. This is why I carefully read ablation studies section when I read few-shot recognition papers.

Different papers use different engineering factors like regularization and data loaders, and it makes a fair comparison very difficult.

chmxu commented

Yes that's helpful. I share your opinion. Thanks!