/Diversify-MHA

EMNLP 2018: Multi-Head Attention with Disagreement Regularization; NAACL 2019: Information Aggregation for Multi-Head Attention with Routing-by-Agreement

Primary LanguagePython