facebookresearch/mixup-cifar10

ResNet architectures on Cifar10 do not follow Kaiming He

kudkudak opened this issue · 6 comments

I think Resnets models (Resnet-18, Resnet-110, etc.) do not match https://arxiv.org/abs/1603.05027. The reason is that Kaiming He used a different architecture (under the same name) for CIFAR-10 compared to ImageNet. In particular Resnet-18 should have ~0.3M params, while the one used in the repository seems to have ~10M. See Tab. 6 and bottom of page 7 from https://arxiv.org/abs/1512.03385:

image

This is not a bug of course, but it might be a bit misleading.

I am also not sure if the ResNet follows the ImageNet version. Before the softmax there should be a final ReLU -> BN: https://github.com/KaimingHe/resnet-1k-layers/blob/master/resnet-pre-act.lua#L118, which if I am looking correctly is absent here https://github.com/facebookresearch/mixup-cifar10/blob/master/models/resnet.py#L166.

Actually, the authors only use 3 layers for ResNets on CIFAR-10 and not 4.
Also, there are no ResNet18, ResNet34... and so on for CIFAR-10. Insetead, there are ResNet20, ResNet32, ResNet44, ResNet56, ResNet110.

I am working with ensembles of ResNets on CIFAR-10 so I have an implementation on PyTorch for ResNets on CIFAR-10 here:

https://github.com/PabloRR100/ResNets_Ensembles/blob/master/resnetsCIFAR10.py

And here is a documentation I prepared for myself to understand the architecture:
http://pabloruizruiz10.com/resources/CNNs/ResNet-on-CIFAR10.pdf

@kudkudak I would also like to know your thougths in case you find any mistake in my implementation

One of the authors here :) Thank you both for raising the issue!

We are actually aware of the discrepancy with the architectures in the original ResNet paper. The one we use follows (https://github.com/kuangliu/pytorch-cifar) which is wider and built upon pre-activation residual blocks (https://arxiv.org/pdf/1603.05027.pdf), hence is called PreAct-ResNet-18 in our paper.

The main reason we choose the wider pre-activation version is its stronger performance. I am sorry if this has caused confusion for some readers. It would be good to have code and results (ERM and mixup) for the original architecture as well, but preferably they should not erase the current stuffs we have, as they are essential to reproduce our results reported in the mixup paper.

Thanks.

Hi @hongyi-zhang , thanks for the response!

I have gone through the paper on pre-activation residual blocks you shared, but I still don't see some points.

  • Where do you mention PreAct-ResNet-18 in the paper?

  • On the paper the experiments on CIFAR-10 compares improvements for ResNet-110 and ResNet-164 when introducing the new Residual Units. However, this ResNet-164 is never mentioned in the original paper for its performance on ResNet-164 right? I am assuming ResNet-164 is equivalent to ResNet110 but changing BasicBlock for BottleNeck block? (Given a total of 2.5M parameters)

  • The implementation of kuangliu you follow still builds architectures that do not match with the new paper, since the new architectures are ResNet-164 and ResNet-1001, but there are not mentioned results for 18, 34, 50, 101 and 152

Thanks.

ynd commented

Hi @kudkudak,

Thanks for your careful read of the paper and code. Indeed, the Resnet paper (https://arxiv.org/abs/1603.05027) uses two slightly different architectures with the same name. The name refers to the number of layers in the network - a name that captures every implementation detail would be pretty complicated (especially when cross-validation can change these specifics, such as the width of the layer)

We use the same convention in our work, and we have made sure to run all our baselines. We appreciate that leaves some ambiguity and so we have provided code for reference and reproducibility of the paper. That's the main purpose of this repo, and changing the code in such a way now would be detrimental to this purpose.