huggingface/pytorch-image-models

[FEATURE] Latest Meta Data

david-klindt opened this issue · 3 comments

Would it be possible to either:

a) update the current model meta data here
or
b) provide some functionality to compute this myself for any newly added model

Thank you so much for this amazing resource!!!

@david-klindt unfortunately that was done by hand, hence haven't had a chance to update.

On the plus side though, since that was last done, the pretrained tag system was added so there'd more or less be a path to programatically do it with some string matching from the part of the model names after the '.' ... would be much faster than manual, but there's a lot more models now, so would still be some work to match tags -> metadata

eg

timm.list_pretrained()

['bat_resnext26ts.ch_in1k',
 'beit_base_patch16_224.in22k_ft_in22k',
 'beit_base_patch16_224.in22k_ft_in22k_in1k',
 'beit_base_patch16_384.in22k_ft_in22k_in1k',
 'beit_large_patch16_224.in22k_ft_in22k',
 'beit_large_patch16_224.in22k_ft_in22k_in1k',
 'beit_large_patch16_384.in22k_ft_in22k_in1k',
 'beit_large_patch16_512.in22k_ft_in22k_in1k',
 'beitv2_base_patch16_224.in1k_ft_in1k',
 'beitv2_base_patch16_224.in1k_ft_in22k',
 'beitv2_base_patch16_224.in1k_ft_in22k_in1k',
 'beitv2_large_patch16_224.in1k_ft_in1k',
 'beitv2_large_patch16_224.in1k_ft_in22k',
 'beitv2_large_patch16_224.in1k_ft_in22k_in1k',
 'botnet26t_256.c1_in1k',
 'caformer_b36.sail_in1k',
 'caformer_b36.sail_in1k_384',
 'caformer_b36.sail_in22k',
...

there's a bit of a scheme to those tags, but it's not rigid

generally

source / recipe _ pretrain dataset _ finetune/final dataset _ finetune/final resolution

'recipe' is usually timm specific models, so natively trained timm models will usually start with the 'recipe' tag right after the dot. Best bit of code for decoding timm recipes is the snippet below. There are also recipes for some 3rd party models like the tf_efficientnet* have recipe tags for advprop, noisystudent, randaugment, etc.

        model_card['details']['Original'] = 'https://github.com/huggingface/pytorch-image-models'
        ptp, _ = pt_tag.split('_', 1)
        a_l = {'a1h', 'a1', 'a2', 'a3', 'ah'}
        b_l = {'b1', 'b2', 'b1k', 'b2k'}
        c_l = {'c', 'c1', 'c2', 'c3', 'ch'}
        d_l = {'d', 'd1', 'd2'}
        ra_l = {'ra3', 'ra2', 'ra', 'racm', 'raa'}
        am_l = {'ram', 'am'}
        train_detail = {}
        if ptp in a_l:
            if ptp.endswith('h'):
                train_detail['desc'] = f'Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `A1` recipe'
                train_detail['opt'] = 'LAMB optimizer'
                if ptp == 'a1h':
                    train_detail['other'] = 'Stronger dropout, stochastic depth, and RandAugment than paper `A1` recipe'
                elif ptp == 'ah':
                    train_detail[
                        'other'] = 'No CutMix. Stronger dropout, stochastic depth, and RandAugment than paper `A1` recipe'
            else:
                train_detail['desc'] = f'ResNet Strikes Back `{ptp.upper()}` recipe'
                train_detail['opt'] = 'LAMB optimizer with BCE loss'
            train_detail['sched'] = 'Cosine LR schedule with warmup'
            has_rsb = True
        elif ptp in b_l:
            train_detail['desc'] = f'Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `B` recipe (equivalent to `timm` `RA2` recipes)'
            train_detail['opt'] = 'RMSProp (TF 1.0 behaviour) optimizer'
            train_detail['sched'] = 'Step (exponential decay w/ staircase) LR schedule with warmup'
            has_rsb = True
        elif ptp in c_l:
            train_detail['desc'] = f'Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `C` recipes'
            train_detail['opt'] = 'SGD (w/ Nesterov) optimizer and AGC (adaptive gradient clipping).'
            if ptp.endswith('h'):
                train_detail['other'] = 'Stronger dropout, stochastic depth, and RandAugment than paper `C1`/`C2` recipes'
            train_detail['sched'] = 'Cosine LR schedule with warmup'
            has_rsb = True
        elif ptp in d_l:
            train_detail['desc'] = f'Based on [ResNet Strikes Back](https://arxiv.org/abs/2110.00476) `D` recipe'
            train_detail['opt'] = 'AdamW optimizer using BCE loss'
            train_detail['sched'] = 'Cosine LR schedule with warmup'
            has_rsb = True
        elif ptp == 'sw':
            train_detail['desc'] = f'Based on Swin Transformer train / pretrain recipe with modifications (related to both DeiT and ConvNeXt recipes)'
            train_detail['opt'] = 'AdamW optimizer, gradient clipping, EMA weight averaging'
            train_detail['sched'] = 'Cosine LR schedule with warmup'
        elif ptp in ra_l:
            train_detail['desc'] = f'RandAugment `{ptp.upper()}` recipe. Inspired by and evolved from EfficientNet RandAugment recipes. Published as `B` recipe in [ResNet Strikes Back](https://arxiv.org/abs/2110.00476).'
            train_detail['opt'] = 'RMSProp (TF 1.0 behaviour) optimizer, EMA weight averaging'
            train_detail['sched'] = 'Step (exponential decay w/ staircase) LR schedule with warmup'
            has_rsb = True
        elif ptp in am_l:
            if ptp == 'ram':
                train_detail['desc'] = f'AugMix (with RandAugment) recipe'
            else:
                train_detail['desc'] = f'AugMix recipe'
            train_detail['opt'] = 'SGD (w/ Nesterov) optimizer and JSD (Jensen–Shannon divergence) loss'
            train_detail['sched'] = 'Cosine LR schedule with warmup'
        elif ptp == 'bt':
            has_tricks = True
            train_detail['desc'] = f'Bag-of-Tricks recipe'
            train_detail['opt'] = 'SGD (w/ Nesterov) optimizer'
            train_detail['sched'] = 'Cosine LR schedule with warmup'
        else:
            assert False

If it's an official weight from a standard player like google, facebook(meta), sail, etc it'll start with '.fb', '.goog'/'.tf' etc

If the model is trained with distillation there should usually be a 'dist' in the tag.

There is also knowledge implicit in the model itself, e.g. you need to know the 'beit' is a vit model but with it's own pretrain scheme, so the pretrain dataset is in22k for the best ones, but it wasn't supervised pretrain. Similarly, the 'clip' variants of some models were pretrained with clip imate-text contrastive learning, etc.

Thank you so much for the fast response Ross!

I can see that it is not easy to bring this whole zoo into one unified taxonomy. However, the scheme you provided does help to some extent.

Again, thank you for this great resource and the fast maintenance 🙏