Boyiliee/MoEx

MoEx normalization for text classification

Closed this issue · 2 comments

Hi. Thanks for the great repo. I want to apply MoEx on text classification. I was wondering what type of normalization I should use for MoEx on a transformer-based model like RoBERTa to get the mean and std.
image

Hi, @Mahhos,

We haven't tried this before, but I guess layer normalization might be better? Since it is one of the most standard normalization methods in NLP.

Hope this helps.

Best,
Boyi

Thanks for your response. Yes, layer normalization makes sense. With layer norm, I was wondering if the interpolation formula (injecting the moments of sample B to the normalized features of sample A) can stay the same as the one you proposed in your paper.

image