Sense-X/UniFormer

One 7x7 conv vs. two 3x3 conv

LMMMEng opened this issue · 4 comments

Thank you for your wonderful work!

Is two 3x3 convs (stride=2) substituted for one 7x7 conv (stride=4) as stem because the former leads to better results?

Yes. Double 3x3 convs not only save computation, but also achieve a little better results.

Thank you! Do you remember exactly how much improvement there was on ImageNet?

Sorry, I'm not sure. But double 3x3 conv is a popular modification in current vision transformers. You can simply adopt the better setting.

Got it, thank you!