IBM/ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.
PythonApache-2.0
Issues
- 0
Will you open the train code?
#11 opened by Felix-fz - 0
Length exploration
#9 opened by XXares - 2
the affects of cumulativing aux loss steps
#5 opened by zjwang21 - 6
torch.dtype is not respected
#6 opened by Vectorrent - 3
Context length
#4 opened by flozi00 - 12