Harzva opened this issue 2 years ago · 1 comments
Describe the question(问题描述) Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts
In the MOE method does expert have to learn and can the frozen model be used as an expert?like gpt3 bert
thanks you very much