How to use this layer in a sequence setting?
agupta54 opened this issue · 0 comments
agupta54 commented
Hi, I am trying to use the MOE class in the decoder portion of a transformer architecture in which I want to replace the feed forward step with a mixture of experts. The input dimension of the class is of type [batch, input_size]. The sequence in each step is variable which leads to a variable input size. How can I use this class in that case