MotorCityCobra/MoE_stacked_w_Attention
Three mixture of expert models through a multi-head attention into a final mixture of experts model
Jupyter Notebook
Three mixture of expert models through a multi-head attention into a final mixture of experts model
Jupyter Notebook