lucidrains/mixture-of-experts

RuntimeError: expected backend CPU and dtype Float but got backend CPU and dtype Long

littlepan0413 opened this issue · 1 comments

code

import torch
from mixture_of_experts import HeirarchicalMoE

moe = HeirarchicalMoE(
dim = 512,
num_experts = (4, 4), # 4 gates on the first layer, then 4 experts on the second, equaling 16 experts
)

inputs = torch.randn(4, 1024, 512)
out, aux_loss = moe(inputs) # (4, 1024, 512), (1,)

print

Traceback (most recent call last):
File "/home/bi/panlu/ComplexQG-MOE/test/test3.py", line 20, in
out, aux_loss = moe(inputs) # (4, 1024, 512), (1,)
File "/home/bi/software/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/bi/software/anaconda/lib/python3.6/site-packages/mixture_of_experts/mixture_of_experts.py", line 254, in forward
dispatch_tensor, combine_tensor, loss = self.gate(inputs)
File "/home/bi/software/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/bi/software/anaconda/lib/python3.6/site-packages/mixture_of_experts/mixture_of_experts.py", line 217, in forward
* safe_one_hot(position_in_expert_1.long(), expert_capacity)[..., None, :] +
RuntimeError: expected backend CPU and dtype Float but got backend CPU and dtype Long

@littlepan0413 it works for me, what version of pytorch are you using?