Res2Net/Res2Net-PretrainedModels

Res2NeXt on Cifar100

qiangwang57 opened this issue · 13 comments

Hi @gasvn ,

Thanks for the brilliant work!

I have a couple of simple questions regarding Res2NeXt on Cifar100.

  1. The implementation for ImageNet used the block without hierarchical addition for downsampling, but the code you mentioned in other issue threads (https://gist.github.com/gasvn/cd7653ef93fb147be05f1ae4abad6589) used group convolutions as the first block at each stage for downsampling instead. I wonder which one is the correct one?
  2. Did you use batch size 256 or 128 for the training? I saw your init LR was set to 0.05, which was used by ResNeXt for batch size 256.

Best wishes,

Qiang

gasvn commented

What's your reproduced number? The downsampling module has no hierarchical addition. And use group conv or the form in the res2net for imagenet for the dowmsampling module have similar results on cifar100. I use batch-size of 64 and lr=0.05 on cifar100 without tuning.

gasvn commented

Please let me know if you still cannot reproduce our results.

Thanks @gasvn for the timely response.

I follow the architecture on ImageNet with batch size 128 and lr 0.1 using 4 GPUs. I managed to reproduce the ResNeXt results, but for Res2NeXt, it is only 80.78.

The only difference I found is the mean and std, where yours are
mean = [0.485, 0.456, 0.406],
std = [0.229, 0.224, 0.225]
which are for ImageNet. I wonder why you choose them for cifar100?

gasvn commented

I didn't notice this when I was training the Res2NeXt. Maybe you can try to use one gpu with batchsize 64 as I did. From my experience, it should not be hard to reproduce the result. I will send you my code once I found it.

I didn't notice this when I was training the Res2NeXt. Maybe you can try to use one gpu with batchsize 64 as I did. From my experience, it should not be hard to reproduce the result. I will send you my code once I found it.

Thanks @gasvn , I will try it and get back to you with the results.

Hi @gasvn ,

I have tried different combinations of downsampling block, batch size, lr, # GPUs, and mean and std, but unfortunately, I did not manage to reproduce the results, even close. The best so far is over 18%.

gasvn commented

Have you manage to reproduce our results?

gasvn commented

I managed to find the code I used for training the res2net on cifar100.
It can reproduce the result of Res2NeXt-29, 6c×24w×4s

  • BestPrec so far@1 83.020 in epoch 273

https://gist.github.com/gasvn/a1793919427f799e74bb7c900af11d4c

I managed to find the code I used for training the res2net on cifar100.
It can reproduce the result of Res2NeXt-29, 6c×24w×4s

  • BestPrec so far@1 83.020 in epoch 273

https://gist.github.com/gasvn/a1793919427f799e74bb7c900af11d4c

Perfect! Thank you very much! I will let you know the results!

I assume the following parameters you used for the training:
batch size: 64
init LR: 0.05
single GPU

Apart from those, anything else I need to pay special attention?

Cheers, Qiang

gasvn commented

Have you manage to reproduce our results? Sorry, there is nothing else that I can help you with.

请问在stride=2的时候,前后两个尺寸不一样是怎么融合呢,直接相加会不会又问题啊

Have you manage to reproduce our results? Sorry, there is nothing else that I can help you with.

Unfortunately, I did not manage to reproduce the results, even close ones.

Anyway, really appreciate your help!