huggingface/pytorch-image-models

[FEATURE] Add model support and pretrained downloadable links for preresnet-18

Closed this issue · 7 comments

Is your feature request related to a problem? Please describe.
A good deal of recent literature on quantization/pruning/distillation uses pre-activation resnet (resnetv2) as preliminary results. However, due to lack of official implementation of this code, accuracy varies among papers because of differing implementation methods and is confusing for researchers.

Describe the solution you'd like
Add model support and pretrained downloadable links for preresnet-18 so there is a fp32 baseline everyone can refer to.

@a2jinhee pre-act resnet lives here in timm https://github.com/huggingface/pytorch-image-models/blob/main/timm/models/resnetv2.py ... you've probably seen that, but I never added the non-bottleneck block or associated model configs (18/34). I was thinking of training a resnet18 with the recipe adapted from mobilnetv4 small (like https://huggingface.co/timm/resnet50d.ra4_e3600_r224_in1k) ... could possibly to a v2 version of that too...

@rwightman Yes, thank you for the prompt reply. I have seen the pre-act resnet models you have tagged, but in my domain (model compression), we use smaller pre-act resnet 18 models instead of the bigger ones to ensure a particular method's scalability. So I thought it would be nice if there was an official implementation of this. Should I make a PR ? I have tweaked some existing open source code to match pre-act resnet18, but I don't have an imagenet pretrained model due to limited compute resource.

@a2jinhee I merged the model configs early, weights are in training though...

@rwightman Thank you for the implementations, I think many will benefit from this! One quick question, what is the difference between resnet18v2 and resnet18v2d? Are there papers that I can reference to for this?
Thank you once again!

@a2jinhee 'D' is a bag of tricks variant of ResNet, it was originally added on top of the torch style ResNet which is called V1B (sometimes v1.5) but the changes work well w/ v2.

It has

  • 3 - 3x3 convolutions in the stem instead of 1 - 7x7 like the original
  • the downsample in shortcut uses avg pool + 1x1 unstrided conv instead of a strided 1x1 conv that throws away info

I have a number of D variants trained for the normal ResNet models in timm, also something I called a 'T' variant which is a D but uses 'tiered' channel progression across the stem convs.

I see, I've seen a lot of variants that used 3x3 conv for smaller datasets, and 7x7 for big ones. So in this case, 'D' would be better for smaller CIFAR-10 datasets, whereas the original would be better for bigger ImageNet datasets.

Please correct me if I'm wrong.

@a2jinhee the weights have arrived. #2316

What you refer to, seeing the 3x3 for smaller (image size) datasets is different. The cifar/mnist oriented ResNets do use only one 3x3 conv in the stem, but that's part of a different goal, they were trying to keep params down and use fewer stride ops (note that conv is stride 1, there is no maxpool, and I think one fewer layer w/ stride for those ResNets, etc). The ImageNet (and larger) image size focused ResNets have a stride of 32 (5 layers with a stride=2 reduction, to reduce input image by 32x), the output feature map is 7x7 for an input of 224. That is too much spatial reduction for 32x32 or smaller images.

'D' replaces 1 7x7 conv with 3 3x3 convs. The 'D' variant overall yields better results than the non-D variant for a vary small param overhead, but does have a bit more drag on the throughput due to a few more layers (technically they end up slightly deeper as each extra conv is paired with another norm + act) + the avg pools in the downsamples.