Some confusion about random mixing
nbl97 opened this issue · 5 comments
Hi~ Many thanks to your excellent works and codebase. I still have some puzzles about the random mixing operator:
- In MetaFormer v1, I noticed that in random mixing is followed by Softmax, while in v2 there is no Softmax.
- According to my understanding,
class spatialfc
corresponds to the random mixing, but it doesn't seem to freeze in the codebase.
If you could explain how random mixing works best, I would appreciate it!
Hi @nbl97 , thanks for your attention.
-
For both v1 and v2, softmax is utilized to normalized the random matrix, see https://github.com/sail-sg/metaformer/blob/main/metaformer_baselines.py#L301
-
I did not put the random mixing code in repo poolformer. the
class spatialfc
refer to spatial MLP.
@yuweihao Thanks for your clarification! It is my mistake that I got confused between spatialfc
and random mixing
. It seems to me that spatialfc
is a learnable version of random mixing
(ignoring softmax). Can I infer that spatialfc
outperforms random mixing
? If softmax is necessary, does spatialfc+softmax
would achieve better performance than spatialfc
? Looking forward to your insights
Yes, spatialfc
can be regarded as a learnable version of random mixing
. Thus, spatialfc
will outperform random mixing
because of learnable parameters. Since spatialfc
's parameters can not be learned, softmax
is necessary to normalize the random matrix. I have not conducted experiments for spatialfc+softmax
, so I am not sure whether it can achieve better performance than spatialfc
. I guess the performance of spatialfc+softmax
and spatialfc
will be similar.
Huge thanks to your explanation and experience~
Welcome~