MiniImageNet 5way1shot acc
chmxu opened this issue · 8 comments
I followed the default setting to train on miniimagenet and use the best_model.pth, which returns acc 59.28%, with a huge gap to the reported one which i don't think is resulted from random choice of episodes. Any idea?
Thank you for your interest in our code base.
First note that each meta-training run can yield different result. I experienced similar issues with many other few-shot learning algorithms. Also, different versions of packages may cause different behaviors (e.g. Python 3 might use different random seed than Python 2).
However, the accuracy of 59% seems to be lower than what I and other users of the repository have experienced. Would it be possible for you to report the configuration you used?
python3.6 pytorch1.2.0 qpth0.0.15
this is my train script
python train.py --gpu 1,2 --save-path "./experiments/miniImageNet_MetaOptNet_SVM" --train-shot 15 --head SVM --network ResNet --dataset miniImageNet --eps 0.1 --episodes-per-batch 2
I think the result is suboptimal because episodes-per-batch is set to 2. In our experiments, we set episodes-per-batch to 8 by default.
I train a model with 4 1080Ti with episodes-per-batch set as 8 and use a best_model which returns 64.13% acc on meta-val set. The model gets 60.42% acc on meta-test set, 2.2% lower than the reported one.
As mentioned in #8, each meta-training run can result in slightly different result. Also, #25 suggests that the result of both ProtoNet and MetaOptNet can vary across different environments. I experienced similar issues with many other few-shot learning algorithms. Also, I guess the versions of packages should matter. In my environment, I never had <61% accuracy on MetaOptNet-SVM when label smoothing is applied.
The message of our paper is about the gap between non-parametric base learners and parametric base learners, and I believe that the gap should exist within the same environment.
no offence but this means we can't compare different methods fairly :)
That’s a good point. This is why I carefully read ablation studies section when I read few-shot recognition papers.
Different papers use different engineering factors like regularization and data loaders, and it makes a fair comparison very difficult.
Yes that's helpful. I share your opinion. Thanks!