[EMNLP22] Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Primary LanguagePython