Some confusion about random mixing

Question

Some confusion about random mixing

nbl97 opened this issue 2 years ago · 5 comments

nbl97 commented 2 years ago

Hi~ Many thanks to your excellent works and codebase. I still have some puzzles about the random mixing operator:

In MetaFormer v1, I noticed that in random mixing is followed by Softmax, while in v2 there is no Softmax.
According to my understanding, class spatialfc corresponds to the random mixing, but it doesn't seem to freeze in the codebase.

If you could explain how random mixing works best, I would appreciate it!

yuweihao commented 2 years ago

Welcome~

Answer 1 · 2023-04-07T10:10:36.000Z

Hi @nbl97 , thanks for your attention.

For both v1 and v2, softmax is utilized to normalized the random matrix, see https://github.com/sail-sg/metaformer/blob/main/metaformer_baselines.py#L301
I did not put the random mixing code in repo poolformer. the class spatialfc refer to spatial MLP.

Answer 2 · 2023-04-10T14:16:11.000Z

@yuweihao Thanks for your clarification! It is my mistake that I got confused between spatialfc and random mixing. It seems to me that spatialfc is a learnable version of random mixing (ignoring softmax). Can I infer that spatialfc outperforms random mixing? If softmax is necessary, does spatialfc+softmax would achieve better performance than spatialfc? Looking forward to your insights

Answer 3 · 2023-04-11T07:47:48.000Z

Yes, spatialfc can be regarded as a learnable version of random mixing. Thus, spatialfc will outperform random mixing because of learnable parameters. Since spatialfc's parameters can not be learned, softmax is necessary to normalize the random matrix. I have not conducted experiments for spatialfc+softmax, so I am not sure whether it can achieve better performance than spatialfc. I guess the performance of spatialfc+softmax and spatialfc will be similar.

Answer 4 · 2023-04-11T09:09:49.000Z

Huge thanks to your explanation and experience~