One 7x7 conv vs. two 3x3 conv
LMMMEng opened this issue · 4 comments
LMMMEng commented
Thank you for your wonderful work!
Is two 3x3 convs (stride=2) substituted for one 7x7 conv (stride=4) as stem because the former leads to better results?
Andy1621 commented
Yes. Double 3x3 convs not only save computation, but also achieve a little better results.
LMMMEng commented
Thank you! Do you remember exactly how much improvement there was on ImageNet?
Andy1621 commented
Sorry, I'm not sure. But double 3x3 conv is a popular modification in current vision transformers. You can simply adopt the better setting.
LMMMEng commented
Got it, thank you!