lucidrains/mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
PythonMIT
Issues
- 1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#5 opened by mxs30443 - 1
PEER implementation
#11 opened by huu4ontocord - 2
Load balancing loss?
#10 opened by Aman-Goel1 - 0
Would you elaborate more on the enhancement?
#9 opened by yhyu13 - 1
- 0
convolution operation
#8 opened by Yonsun-w - 1
implicit inplace operation '*=' cause an error when deriving the back gradient in pytorch
#6 opened by VRCMF - 0
- 1
Error reported under FP16 training
#3 opened by SefaZeng - 1
RuntimeError: expected backend CPU and dtype Float but got backend CPU and dtype Long
#2 opened by littlepan0413 - 1
Segmentation Fault?
#1 opened by SungMinCho