MoEx normalization for text classification

Question

MoEx normalization for text classification

Mahhos opened this issue 2 years ago · 2 comments

Hi. Thanks for the great repo. I want to apply MoEx on text classification. I was wondering what type of normalization I should use for MoEx on a transformer-based model like RoBERTa to get the mean and std.

Answer 1 · 2022-07-06T01:13:04.000Z

Hi, @Mahhos,

We haven't tried this before, but I guess layer normalization might be better? Since it is one of the most standard normalization methods in NLP.

Hope this helps.

Best,
Boyi

Answer 2 · 2022-07-08T14:32:32.000Z

Thanks for your response. Yes, layer normalization makes sense. With layer norm, I was wondering if the interpolation formula (injecting the moments of sample B to the normalized features of sample A) can stay the same as the one you proposed in your paper.