rwightman/gen-efficientnet-pytorch

Pretrained weights failing to load (EffNet-B5)?

lessw2020 opened this issue · 3 comments

Hi @rwightman,
Thanks first for this awesome repo. I'm trying to use your impl to get the AP pre-trained B5, but it's quite clearly failing to load the pretrained weights though with neither an error nor a confirm the weights were loaded. Is this a known issue or am I doing something wrong? (edit - ok I re-read the readme and think I misunderstood that AP implemented meant with pretrained weights available...anyway, if so then passing preTrained=True where no weights exist should ideally print a warning or error?)

1 - Installed via pip install geffnet
2 - Import geffnet
3 - model = geffnet.create_model('efficientnet_b5',num_classes = data.c,pretrained=True, drop_rate=0.2, drop_connect_rate=0.2)#, as_sequential=True)
Normally I'm used to seeing a "loading .pth and the progress bar here on a new instance, or a confirmation of weights loaded. I did not see either but no error either.
4 - When you go to train it becomes abundantly clear that it's working with a new init network (i.e. first epoch close to random, then verrry slow training progress. By contrast a pre-trained digs right in.

If possible, it would be great to get a confirmation message like in Melas impl once weights are loaded: "Loaded pretrained weights for efficientnet-b5" or if not a warning that it's a new network if pretrained=True was passed in?

Thanks much!
Less

@lessw2020 the model names without a tf_ prefix are for natively PyTorch trained weights. There aren't any weights beyond B2 (https://github.com/rwightman/gen-efficientnet-pytorch/blob/master/geffnet/gen_efficientnet.py#L69)

It takes a REALLY long time to train these models from scratch with the GPU setup of a mere mortal (2-3 gamer GPUs per machine). I actually have trained B3 recently and it took btw 3-4 weeks on dual Titan RTX. Those latest results and training hparasm are at (https://github.com/rwightman/pytorch-image-models) ... I will update the B2/B3 model here soon, was hoping to have more results first....

If you want the B5 AdvProp weights, those are ported from tf, you want to use tf_efficientnet_b5_ap for that.

And yeah, I should have a 'random initialization, weights not loaded' warning, I thought I had it, but looks like that was in the other one :)

Also beware, the AdvProp models use different normalization (Inception style) and not the same as the other models (ResNet style)

@rwightman - thanks a bunch for the clarification, as well as the pointer re: Inception style normalization!
Re: 2-3 weeks with 2-3 gamer GPUs - I believe it! All the more reason I greatly appreciate the work you are doing with this repro! My impression has always been EfficientNets are a beast to train and you've confirmed it here. Upside is I'm seeing the best results with EffNets of any arch I tried on my current project so I'll be diving deep with EffNets for some time now.

re: "And yeah, I should have a 'random initialization, weights not loaded' warning, I thought I had it, but looks like that was in the other one :)"
That would be awesome if you have time to add that just to help people from panicking like I did when I saw my results after the first two epochs lol.

Anyway I'll close this since there are no weights to preload and again, I greatly appreciate all the work you are doing on this repo!