MoH: Multi-Head Attention as Mixture-of-Head Attention
Primary LanguagePythonApache License 2.0Apache-2.0