/MoE_stacked_w_Attention

Three mixture of expert models through a multi-head attention into a final mixture of experts model

Primary LanguageJupyter Notebook

Stargazers